hdim_1 and hdim_2 double-defined in feature dataframe #216

JuliaKukulies · 2022-12-11T16:17:29Z

Also from discussion #214:

The columns hdim_1 and hdim_2 in the output data frame from feature_detection_multithreshold() are some cases double-defined, whereby one of the two hdim_1 (hdim_2) has the same value as south_north west_east. This can, for example, be seen in our example notebook for WRF model data and seems only be the case for non latlon grids as it does not appear in this example.

I am not sure of the scope of this problem. It leads to the fact that the Features and Track data frames cannot be saved to an HDF file because column names are not allowed to be double-defined. Additionally, I believe that this issue could cause more serious bugs when we have suddenly two different values for hdim_1/ hdim_2 ?

The text was updated successfully, but these errors were encountered:

freemansw1 · 2022-12-17T23:13:41Z

Yeah, this is a critical bug to identify and fix. Thanks for documenting the bug, @JuliaKukulies, and thanks for identifying it, @lk337. If nobody else does, I will have some time to dig into this early next week. We should try to get this out quickly for a 1.4.1 bugfix release. Once we identify the issue, we should probably add in a unit test to make sure that it doesn't happen again.

Longer-term, I think it would be good to add a CI step to auto-run the jupyter notebooks and make sure they don't fail. Maybe target that for v1.6.0.

w-k-jones · 2022-12-18T00:19:46Z

Problem is caused in linking_trackpy here:

tobac/tobac/tracking.py

Line 254 in 49c4c27

features.rename(columns={"hdim_1": "y", "hdim_2": "x"}, inplace=True)

This is done to avoid a bug with trackpy, however because in the OLR_tracking_model example data the features dataframe already has x, y coords, this results in repeating the column names

When the columns are renamed back to hdim1, hdim2

tobac/tobac/tracking.py

Line 283 in 49c4c27

features.rename(columns={"y": "hdim_1", "x": "hdim_2"}, inplace=True)

this also renames the original x, y columns.

A quick fix would be to check if x, y already exist as column names in the features dataframe and temporarily rename them while tracking. The longer term fix would be solving the original issue with trackpy that caused this workaround

w-k-jones · 2022-12-20T08:47:54Z

Fixed in #217

JuliaKukulies added the bug Code that is failing or producing the wrong result label Dec 11, 2022

JuliaKukulies changed the title ~~hdim_1 and hdim_2 double-defined in feature detection~~ hdim_1 and hdim_2 double-defined in feature dataframe Dec 11, 2022

freemansw1 added the High Priority This issue needs immediate fixing, and may warrant a hotfix release label Dec 17, 2022

freemansw1 added this to the Version 1.4.1 milestone Dec 17, 2022

w-k-jones mentioned this issue Dec 18, 2022

Fix duplication of column names in linking_trackpy #217

Merged

11 tasks

freemansw1 assigned w-k-jones Dec 19, 2022

w-k-jones closed this as completed Dec 20, 2022

w-k-jones mentioned this issue Dec 20, 2022

v1.4.1 hotfix #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hdim_1 and hdim_2 double-defined in feature dataframe #216

hdim_1 and hdim_2 double-defined in feature dataframe #216

JuliaKukulies commented Dec 11, 2022 •

edited

Loading

freemansw1 commented Dec 17, 2022 •

edited

Loading

w-k-jones commented Dec 18, 2022

w-k-jones commented Dec 20, 2022

hdim_1 and hdim_2 double-defined in feature dataframe #216

hdim_1 and hdim_2 double-defined in feature dataframe #216

Comments

JuliaKukulies commented Dec 11, 2022 • edited Loading

Also from discussion #214:

freemansw1 commented Dec 17, 2022 • edited Loading

w-k-jones commented Dec 18, 2022

w-k-jones commented Dec 20, 2022

JuliaKukulies commented Dec 11, 2022 •

edited

Loading

freemansw1 commented Dec 17, 2022 •

edited

Loading