Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for stack+groupby+apply w/ non-increasing coord #3906

Merged
merged 5 commits into from
Mar 31, 2020

Conversation

spencerahill
Copy link
Contributor

@spencerahill spencerahill commented Mar 28, 2020

I've added a check within groupby.__init__ for monotonicity of the coords, which makes my test pass, but it causes three existing tests to fail. Getting the logic right that properly distinguishes among these different cases is escaping me for now, but it does seem like this call to unique_value_groups within groupby.__init__ is where the re-sorting is occurring. Feedback welcome. TIA!

@spencerahill
Copy link
Contributor Author

OK @max-sixty I've gotten logic that works in the sense that it works for this use case and doesn't break any existing tests. But I could use a look over by somebody more familiar with groupby + multi-indexing, as it's totally possible that this quick fix causes other problems.

@spencerahill spencerahill changed the title WIP fix for stack+groupby+apply w/ non-increasing coord Fix for stack+groupby+apply w/ non-increasing coord Mar 30, 2020
@max-sixty
Copy link
Collaborator

This looks great @spencerahill ! Tests are very clear. The code fix looks good; I'll leave this open for a day in case those who know the code section better have any comments.

Thank you!

@@ -370,8 +370,11 @@ def __init__(
group = group.dropna(group_dim)

# look through group to find the unique values
sort = bins is None and (
not isinstance(safe_cast_to_index(group), pd.MultiIndex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One performance opt is to only do safe_cast_to_index(group) once; though I don't think it's huge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done in 4407f40.

unique_values, group_indices = unique_value_groups(
safe_cast_to_index(group), sort=(bins is None)
safe_cast_to_index(group), sort=sort
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure why we generally sort; lmk if you happened to have found the reason when testing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that was my first try, to simply remove the sorting, but then there were three test failures in test_dataarray, see e.g. here.

They all involve grouping on 1D Indexes rather than MultiIndexes, which is what led me to the fix I settled on. But that's as far as my understanding goes!

@spencerahill
Copy link
Contributor Author

Great, thanks @max-sixty and happy to do it.

@max-sixty max-sixty merged commit b3bafee into pydata:master Mar 31, 2020
@max-sixty
Copy link
Collaborator

Great, thanks again @spencerahill !

@spencerahill spencerahill deleted the stack-groupby-bugfix branch March 31, 2020 18:29
dcherian added a commit to dcherian/xarray that referenced this pull request Apr 5, 2020
* master:
  Use divergent colormap if lowest and highest level span 0 (pydata#3913)
  Bugfix for plotting transposed 2d coords (pydata#3934)
  Allow plotting bool data (pydata#3766)
  facetgrid: fix case when vmin == vmax (pydata#3916)
  add a CI that tests xarray with all optional dependencies but dask (pydata#3919)
  Add missing_dims argument allowing isel() to ignore missing dimensions (pydata#3923)
  Only fail if a specific warning occurs (pydata#3930)
  Fix minor code quality issues (pydata#3626)
  Fix for stack+groupby+apply w/ non-increasing coord (pydata#3906)
  reactivate the macos CI (pydata#3920)
  add pint to the output of show_versions() (pydata#3918)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupBy of stacked dim with strings renames underlying dims
2 participants