Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdim_1 and hdim_2 double-defined in feature dataframe #216

Closed
JuliaKukulies opened this issue Dec 11, 2022 · 3 comments
Closed

hdim_1 and hdim_2 double-defined in feature dataframe #216

JuliaKukulies opened this issue Dec 11, 2022 · 3 comments
Assignees
Labels
bug Code that is failing or producing the wrong result High Priority This issue needs immediate fixing, and may warrant a hotfix release
Milestone

Comments

@JuliaKukulies
Copy link
Member

JuliaKukulies commented Dec 11, 2022

Also from discussion #214:

The columns hdim_1 and hdim_2 in the output data frame from feature_detection_multithreshold() are some cases double-defined, whereby one of the two hdim_1 (hdim_2) has the same value as south_north west_east. This can, for example, be seen in our example notebook for WRF model data and seems only be the case for non latlon grids as it does not appear in this example.

I am not sure of the scope of this problem. It leads to the fact that the Features and Track data frames cannot be saved to an HDF file because column names are not allowed to be double-defined. Additionally, I believe that this issue could cause more serious bugs when we have suddenly two different values for hdim_1/ hdim_2 ?

@JuliaKukulies JuliaKukulies added the bug Code that is failing or producing the wrong result label Dec 11, 2022
@JuliaKukulies JuliaKukulies changed the title hdim_1 and hdim_2 double-defined in feature detection hdim_1 and hdim_2 double-defined in feature dataframe Dec 11, 2022
@freemansw1 freemansw1 added the High Priority This issue needs immediate fixing, and may warrant a hotfix release label Dec 17, 2022
@freemansw1 freemansw1 added this to the Version 1.4.1 milestone Dec 17, 2022
@freemansw1
Copy link
Member

freemansw1 commented Dec 17, 2022

Yeah, this is a critical bug to identify and fix. Thanks for documenting the bug, @JuliaKukulies, and thanks for identifying it, @lk337. If nobody else does, I will have some time to dig into this early next week. We should try to get this out quickly for a 1.4.1 bugfix release. Once we identify the issue, we should probably add in a unit test to make sure that it doesn't happen again.

Longer-term, I think it would be good to add a CI step to auto-run the jupyter notebooks and make sure they don't fail. Maybe target that for v1.6.0.

@w-k-jones
Copy link
Member

Problem is caused in linking_trackpy here:

features.rename(columns={"hdim_1": "y", "hdim_2": "x"}, inplace=True)

This is done to avoid a bug with trackpy, however because in the OLR_tracking_model example data the features dataframe already has x, y coords, this results in repeating the column names

When the columns are renamed back to hdim1, hdim2

features.rename(columns={"y": "hdim_1", "x": "hdim_2"}, inplace=True)
this also renames the original x, y columns.

A quick fix would be to check if x, y already exist as column names in the features dataframe and temporarily rename them while tracking. The longer term fix would be solving the original issue with trackpy that caused this workaround

@w-k-jones
Copy link
Member

Fixed in #217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Code that is failing or producing the wrong result High Priority This issue needs immediate fixing, and may warrant a hotfix release
Projects
None yet
Development

No branches or pull requests

3 participants