-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for stack+groupby+apply w/ non-increasing coord #3906
Conversation
OK @max-sixty I've gotten logic that works in the sense that it works for this use case and doesn't break any existing tests. But I could use a look over by somebody more familiar with groupby + multi-indexing, as it's totally possible that this quick fix causes other problems. |
This looks great @spencerahill ! Tests are very clear. The code fix looks good; I'll leave this open for a day in case those who know the code section better have any comments. Thank you! |
xarray/core/groupby.py
Outdated
@@ -370,8 +370,11 @@ def __init__( | |||
group = group.dropna(group_dim) | |||
|
|||
# look through group to find the unique values | |||
sort = bins is None and ( | |||
not isinstance(safe_cast_to_index(group), pd.MultiIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One performance opt is to only do safe_cast_to_index(group)
once; though I don't think it's huge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in 4407f40.
xarray/core/groupby.py
Outdated
unique_values, group_indices = unique_value_groups( | ||
safe_cast_to_index(group), sort=(bins is None) | ||
safe_cast_to_index(group), sort=sort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure why we generally sort; lmk if you happened to have found the reason when testing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that was my first try, to simply remove the sorting, but then there were three test failures in test_dataarray
, see e.g. here.
They all involve grouping on 1D Indexes rather than MultiIndexes, which is what led me to the fix I settled on. But that's as far as my understanding goes!
Great, thanks @max-sixty and happy to do it. |
Great, thanks again @spencerahill ! |
* master: Use divergent colormap if lowest and highest level span 0 (pydata#3913) Bugfix for plotting transposed 2d coords (pydata#3934) Allow plotting bool data (pydata#3766) facetgrid: fix case when vmin == vmax (pydata#3916) add a CI that tests xarray with all optional dependencies but dask (pydata#3919) Add missing_dims argument allowing isel() to ignore missing dimensions (pydata#3923) Only fail if a specific warning occurs (pydata#3930) Fix minor code quality issues (pydata#3626) Fix for stack+groupby+apply w/ non-increasing coord (pydata#3906) reactivate the macos CI (pydata#3920) add pint to the output of show_versions() (pydata#3918)
isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new APII've added a check within
groupby.__init__
for monotonicity of the coords, which makes my test pass, but it causes three existing tests to fail. Getting the logic right that properly distinguishes among these different cases is escaping me for now, but it does seem like this call tounique_value_groups
withingroupby.__init__
is where the re-sorting is occurring. Feedback welcome. TIA!