Add PBC capability to `merge_split_MEST` #372

w-k-jones · 2023-11-29T13:11:43Z

This turned into a general refactor of merge_split_MEST, with the following changes and additions:

Adds PBC capabilities, using the same BallTree distance search approach as used in feature detection.
Moves filtering by frame_len before calculating the minimum spanning tree, improving the number of merging/splitting cells linked particularly for long tracking time periods.
Changes the cell merging logic to use scipy.sparse.csgraph.connected_components, improving performance
Made sure all output arrays are int dtype, and started the track id from 1 rather than 0

I'm also going to look into adding a flag for whether a cell started with a split or ended with a merge, and which object it was merged into/split from

… add PBC support

w-k-jones · 2023-11-29T13:13:55Z

Keeping as draft until #368 is merged as this contains the same commits

…ew test for frame_len parameter

… tracks

kelcyno · 2023-11-29T17:28:36Z

Oh this looks really interesting @w-k-jones I'll get started on it!

w-k-jones · 2023-11-29T17:36:19Z

Ok, I ended up going down a bit of a rabbit hole there. I found that because the filtering for frame_len was carried out after the minimum euclidean spanning tree calculation, valid links were being removed because a spatially closer cell at another point in time was being removed and therefore removing the edges between two neighbouring cells. This had more of an impact the longer the time period being tracked. When I swapped to filtering by frame_len before calculating the MEST it massively increased the number of linked cells, which then caused the rest of the merge/split routine to run slower. I've been experimenting with scipy.sparse.csgraph.connected_components recently, and this seemed like a really good application for linking the neighbouring cells into connected tracks

…for unassigned features

codecov · 2023-11-29T21:09:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.78%. Comparing base (3609a59) to head (412ab97).
Report is 77 commits behind head on RC_v1.5.x.

Additional details and impacted files

@@              Coverage Diff              @@
##           RC_v1.5.x     #372      +/-   ##
=============================================
- Coverage      60.96%   60.78%   -0.18%     
=============================================
  Files             23       23              
  Lines           3548     3522      -26     
=============================================
- Hits            2163     2141      -22     
+ Misses          1385     1381       -4

Flag	Coverage Δ
unittests	`60.78% <100.00%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

freemansw1 · 2023-11-29T21:32:15Z

I haven't started reviewing this yet, but I wonder if we shouldn't try to push this to v1.5.3? I'm starting to get antsy about v1.5.2 coming out.

…smatch

…tree

freemansw1

Thanks for doing this, @w-k-jones; a critical improvement to an important piece of code.

I'm not that familiar with this code, but I've added in some thoughts here in a few places. Probably would be good to get a review in from @kelcyno.

tobac/feature_detection.py

tobac/merge_split.py

w-k-jones · 2024-03-09T00:23:22Z

Reminder to myself: currently the PBC handling requires all of h1_min, h1_max, h2_min and h2_max to be provided even if PBCs are calculated over only one dimension. I should fix this to use default values for the dimension not being used before merge

tobac/merge_split.py

kelcyno

I don't have the ability to test this with PBC data, but with radar data and my typical datasets the updates from @w-k-jones works swimmingly. I'm happy to approve as long as Sean's comments are/have been addressed.

kelcyno · 2024-07-19T05:42:39Z

Oh - per our last Tobac meeting we talked about if Will's changes needed to be a separate merge/split method - it does not need to be a separate method.

JuliaKukulies · 2024-07-31T00:37:04Z

Oh - per our last Tobac meeting we talked about if Will's changes needed to be a separate merge/split method - it does not need to be a separate method.

@w-k-jones @kelcyno I tested this with some model data over CONUS and found that Will's merge/split method results in a significantly lower number (about a 5th) of unique track IDs (basically the variables feature_parent_track_id and cell_parent_track_id). Not sure if this is a bug but probably not intended behaviour?

Is the assignment of track IDs here really the same to what Kelcy does in the long loop?

https://github.com/w-k-jones/tobac/blob/e9ec119b45becbcf0057cfc11114dd71fe3e33ab/tobac/merge_split.py#L205-L213

@w-k-jones I tested this with the MCS criteria for our DYAMOND project and because the number of unique track IDs is so different using the modified merge split function, this results also in a very different number of MCSs if the criteria are applied to the clusters belonging to a track.

w-k-jones · 2024-08-28T17:15:25Z

@JuliaKukulies I believe the change in the results is due to a bug I fixed in the old version: because the minimum spanning tree was applied before the time filter it would result in cell starts/ends which were close together in x/y coords but far apart from time being identified as neighbours, but then subsequently trimmed due to the time gap. This could mean that other cells which were within the time range for merging but further apart in x/y would not be linked. I changed the order in which these operations were applied to avoid this issue, which will result in more cell merges and a smaller number of unique tracks. This wasn’t found originally as the method was designed for short time periods, but became apparent during MCSMIP due to the long time period. Could you test running your data over different length time slices (e.g. 1 hour, 2 hour, 3 hours etc) and see if the proportion of tracks/cells remains roughly the same for the new version but increases (i.e. more tracks and fewer cells per track) with the old version?

JuliaKukulies · 2024-08-31T01:05:27Z

@JuliaKukulies I believe the change in the results is due to a bug I fixed in the old version: because the minimum spanning tree was applied before the time filter it would result in cell starts/ends which were close together in x/y coords but far apart from time being identified as neighbours, but then subsequently trimmed due to the time gap. This could mean that other cells which were within the time range for merging but further apart in x/y would not be linked. I changed the order in which these operations were applied to avoid this issue, which will result in more cell merges and a smaller number of unique tracks. This wasn’t found originally as the method was designed for short time periods, but became apparent during MCSMIP due to the long time period. Could you test running your data over different length time slices (e.g. 1 hour, 2 hour, 3 hours etc) and see if the proportion of tracks/cells remains roughly the same for the new version but increases (i.e. more tracks and fewer cells per track) with the old version?

@w-k-jones Thanks for this clarification! That makes a lot of sense. I ran tests with different time slices and you are absolutely right. The number of unique tracks and their proportion to cells increases in the old version but the proportion stays about the same in your updated version. The difference in unique tracks is not very high in the first hours and even days, but then becomes quite high when running the merging and splitting module for tracks of a whole month.

JuliaKukulies

With the clarifications from our last discussion and the additional tests I did, I am approving this now because everything seems to work fine and as expected :) Excellent job, @w-k-jones!

…tions of features in merge split, and improve documentation

github-actions · 2024-09-24T14:47:05Z

Linting results by Pylint:

Your code has been rated at 8.73/10 (previous run: 8.73/10, +0.00)
_{The linting score is an indicator that reflects how well your code version follows Pylint’s coding standards and quality metrics with respect to the RC_v1.5.x branch.

A decrease usually indicates your new code does not fully meet style guidelines or has potential errors.}

…into merge_split_pbc

w-k-jones · 2024-09-24T14:58:45Z

@freemansw1 I have added the capability to use a vertical coordinate name to specify variable grid spacing, in the same manner as it's implemented in tracking. Let me know if this resolves your comment from the review. I've also clarified the documentation a little, but I think we need to decide in general how we signpost default values.

…into merge_split_pbc

w-k-jones added 3 commits November 29, 2023 11:37

Change distance search in merge_split_MEST to use BallTree query, and…

41a6ed0

… add PBC support

Fix dimension order for coordinates in merge_split

098e4d8

Add tests for merge_split with PBCs

6b5cb68

w-k-jones added the enhancement Addition of new features, or improved functionality of existing features label Nov 29, 2023

w-k-jones added this to the Version 1.5.2 milestone Nov 29, 2023

w-k-jones requested review from freemansw1, kelcyno and JuliaKukulies November 29, 2023 13:12

w-k-jones added 2 commits November 29, 2023 14:10

Speed up adding edges to graph using add_weighted_edges_from

3deffa9

Remove unused imports

2d5d366

w-k-jones self-assigned this Nov 29, 2023

w-k-jones linked an issue Nov 29, 2023 that may be closed by this pull request

Add PBC support to merge_split_MEST #371

Closed

w-k-jones added 2 commits November 29, 2023 15:14

Apply frame_len filter before assigning edges to the graph, and add n…

1c7ed6f

…ew test for frame_len parameter

Use connected_components from scipy.sparse.csgraph to link cells into…

a51f027

… tracks

w-k-jones added 5 commits November 29, 2023 17:56

Fix handling of unassigned features in merge_split_MEST and add test …

5b4aa05

…for unassigned features

Remove unused import

ce36b3b

Fix distance calculation for 2D case

eff22f6

Add flag for cells starting with split/ending with merge

de01424

Rename cell_starts_with_merge flag

0b75bae

w-k-jones marked this pull request as ready for review November 29, 2023 21:06

w-k-jones changed the base branch from RC_v1.5.x to main November 29, 2023 21:07

w-k-jones changed the base branch from main to RC_v1.5.x November 29, 2023 21:07

Add test for 3D merge/split and fix coordinate stack axis

c888bf9

w-k-jones modified the milestones: Version 1.5.2, Version 1.5.x Nov 29, 2023

w-k-jones added 3 commits November 29, 2023 22:31

Cast start_node_cells and end_node_cells to np.int32 to avoid type mi…

c63632a

…smatch

Use groupby instead of bincount to count number of child cells/features

fa4659c

Recalculate start and end node cells after applying minimum spanning …

e9ec119

…tree

freemansw1 requested changes Feb 28, 2024

View reviewed changes

tobac/feature_detection.py Show resolved Hide resolved

tobac/merge_split.py Show resolved Hide resolved

tobac/merge_split.py Show resolved Hide resolved

kelcyno requested a review from freemansw1 February 28, 2024 16:37

JuliaKukulies modified the milestones: Version 1.5.3, Version 1.5.4 Mar 22, 2024

w-k-jones mentioned this pull request May 21, 2024

min_distance doesn't work with 3D tracking where dz not constant #430

Closed

4 tasks

fsenf removed the request for review from JuliaKukulies June 21, 2024 14:13

kelcyno reviewed Jul 19, 2024

View reviewed changes

tobac/merge_split.py Show resolved Hide resolved

kelcyno reviewed Jul 19, 2024

View reviewed changes

tobac/merge_split.py Show resolved Hide resolved

kelcyno reviewed Jul 19, 2024

View reviewed changes

tobac/merge_split.py Show resolved Hide resolved

kelcyno approved these changes Jul 19, 2024

View reviewed changes

freemansw1 requested a review from JuliaKukulies August 9, 2024 14:10

JuliaKukulies approved these changes Sep 13, 2024

View reviewed changes

freemansw1 approved these changes Sep 13, 2024

View reviewed changes

Add optional use of dz or vertical_coord for specifying vertical loca…

6b3473a

…tions of features in merge split, and improve documentation

w-k-jones added 3 commits September 24, 2024 15:49

Remove erroneus import

8302b16

Merge branch 'RC_v1.5.x' of https://github.com/climate-processes/tobac …

b6a48ac

…into merge_split_pbc

Fix changed import location of build_distance_function

4c96374

Merge branch 'RC_v1.5.x' of https://github.com/climate-processes/tobac …

412ab97

…into merge_split_pbc

w-k-jones merged commit 03315ab into tobac-project:RC_v1.5.x Sep 28, 2024
24 checks passed

w-k-jones mentioned this pull request Sep 28, 2024

Add PBC support to merge_split_MEST #371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PBC capability to `merge_split_MEST` #372

Add PBC capability to `merge_split_MEST` #372

w-k-jones commented Nov 29, 2023 •

edited

Loading

w-k-jones commented Nov 29, 2023

kelcyno commented Nov 29, 2023

w-k-jones commented Nov 29, 2023

codecov bot commented Nov 29, 2023 •

edited

Loading

freemansw1 commented Nov 29, 2023

freemansw1 left a comment

w-k-jones commented Mar 9, 2024 •

edited

Loading

kelcyno left a comment

kelcyno commented Jul 19, 2024

JuliaKukulies commented Jul 31, 2024

w-k-jones commented Aug 28, 2024

JuliaKukulies commented Aug 31, 2024

JuliaKukulies left a comment

github-actions bot commented Sep 24, 2024 •

edited

Loading

w-k-jones commented Sep 24, 2024

Add PBC capability to merge_split_MEST #372

Add PBC capability to merge_split_MEST #372

Conversation

w-k-jones commented Nov 29, 2023 • edited Loading

w-k-jones commented Nov 29, 2023

kelcyno commented Nov 29, 2023

w-k-jones commented Nov 29, 2023

codecov bot commented Nov 29, 2023 • edited Loading

Codecov Report

freemansw1 commented Nov 29, 2023

freemansw1 left a comment

Choose a reason for hiding this comment

w-k-jones commented Mar 9, 2024 • edited Loading

kelcyno left a comment

Choose a reason for hiding this comment

kelcyno commented Jul 19, 2024

JuliaKukulies commented Jul 31, 2024

w-k-jones commented Aug 28, 2024

JuliaKukulies commented Aug 31, 2024

JuliaKukulies left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 24, 2024 • edited Loading

Linting results by Pylint:

w-k-jones commented Sep 24, 2024

Add PBC capability to `merge_split_MEST` #372

Add PBC capability to `merge_split_MEST` #372

w-k-jones commented Nov 29, 2023 •

edited

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading

w-k-jones commented Mar 9, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited

Loading