-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PBC capability to merge_split_MEST
#372
Add PBC capability to merge_split_MEST
#372
Conversation
Keeping as draft until #368 is merged as this contains the same commits |
…ew test for frame_len parameter
Oh this looks really interesting @w-k-jones I'll get started on it! |
Ok, I ended up going down a bit of a rabbit hole there. I found that because the filtering for |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## RC_v1.5.x #372 +/- ##
=============================================
- Coverage 60.96% 60.78% -0.18%
=============================================
Files 23 23
Lines 3548 3522 -26
=============================================
- Hits 2163 2141 -22
+ Misses 1385 1381 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I haven't started reviewing this yet, but I wonder if we shouldn't try to push this to v1.5.3? I'm starting to get antsy about v1.5.2 coming out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, @w-k-jones; a critical improvement to an important piece of code.
I'm not that familiar with this code, but I've added in some thoughts here in a few places. Probably would be good to get a review in from @kelcyno.
Reminder to myself: currently the PBC handling requires all of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the ability to test this with PBC data, but with radar data and my typical datasets the updates from @w-k-jones works swimmingly. I'm happy to approve as long as Sean's comments are/have been addressed.
Oh - per our last Tobac meeting we talked about if Will's changes needed to be a separate merge/split method - it does not need to be a separate method. |
@w-k-jones @kelcyno I tested this with some model data over CONUS and found that Will's merge/split method results in a significantly lower number (about a 5th) of unique track IDs (basically the variables Is the assignment of track IDs here really the same to what Kelcy does in the long loop? @w-k-jones I tested this with the MCS criteria for our DYAMOND project and because the number of unique track IDs is so different using the modified merge split function, this results also in a very different number of MCSs if the criteria are applied to the clusters belonging to a track. |
@JuliaKukulies I believe the change in the results is due to a bug I fixed in the old version: because the minimum spanning tree was applied before the time filter it would result in cell starts/ends which were close together in x/y coords but far apart from time being identified as neighbours, but then subsequently trimmed due to the time gap. This could mean that other cells which were within the time range for merging but further apart in x/y would not be linked. I changed the order in which these operations were applied to avoid this issue, which will result in more cell merges and a smaller number of unique tracks. This wasn’t found originally as the method was designed for short time periods, but became apparent during MCSMIP due to the long time period. Could you test running your data over different length time slices (e.g. 1 hour, 2 hour, 3 hours etc) and see if the proportion of tracks/cells remains roughly the same for the new version but increases (i.e. more tracks and fewer cells per track) with the old version? |
@w-k-jones Thanks for this clarification! That makes a lot of sense. I ran tests with different time slices and you are absolutely right. The number of unique tracks and their proportion to cells increases in the old version but the proportion stays about the same in your updated version. The difference in unique tracks is not very high in the first hours and even days, but then becomes quite high when running the merging and splitting module for tracks of a whole month. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the clarifications from our last discussion and the additional tests I did, I am approving this now because everything seems to work fine and as expected :) Excellent job, @w-k-jones!
…tions of features in merge split, and improve documentation
Linting results by Pylint:Your code has been rated at 8.73/10 (previous run: 8.73/10, +0.00) |
@freemansw1 I have added the capability to use a vertical coordinate name to specify variable grid spacing, in the same manner as it's implemented in tracking. Let me know if this resolves your comment from the review. I've also clarified the documentation a little, but I think we need to decide in general how we signpost default values. |
…into merge_split_pbc
This turned into a general refactor of
merge_split_MEST
, with the following changes and additions:BallTree
distance search approach as used in feature detection.frame_len
before calculating the minimum spanning tree, improving the number of merging/splitting cells linked particularly for long tracking time periods.track
id from 1 rather than 0I'm also going to look into adding a flag for whether a cell started with a split or ended with a merge, and which object it was merged into/split from