-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: GroupBy.apply() returns different results if a different GroupBy method is called first #35314
Conversation
…olumns as columns. updated tests that relied on previous behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @TomAugspurger of u would have a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, if my understanding on that point is correct.
I'm not going to have a chance to look through this closely today. Given that we're in an RC period and this isn't a regression fix, I think we should target this for 1.1.1. |
sgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally lgtm.
pandas/core/groupby/groupby.py
Outdated
# be trimmed by implementing cython funcs for more dtypes | ||
pass | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, codecov pointed that this is not covered, can you see if you can get a test to lands here (otherwise remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a way to reach this line from any public function call. AFAICS anything that would raise a NotImplementedError
that comes under the else:
, also trips a DataError
which catches it before this line.
So I've gone with removing it.
@jreback I think this one is ready to be merged. |
lgtm. can you merge master again and ping on green (as merged a few other groupby patches recently). |
thanks @smithto1 (failure is unrelated) |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Behavioural Changes
.apply()
Calls to
self._set_group_selection
have been replaced withwith _group_selection_context(self):
for_agg_general
,_make_wrapper
, andnth
.Previously these calls to
self._set_group_selection
created a bug inGroupBy.apply
where calling another method before.apply
would change the output of.apply
. This bug is now fixed.Tests
One new test is added to check that the output of
.apply
is constant whether another method is called on the same grouper first.Two existing tests were actually dependent on the old buggy-behaviour (i.e. they called GroupBy.sum first and then expected that GroupBy.apply(sum) would exclude the index columns from the results). All of these tests have been amended in a manner that enforces the new consistent output format while preserving the existing test.
Both of the copy-pastable examples in the linked bug-reports are fixed.