Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy.apply() returns different results if a different GroupBy method is called first #35314

Merged
merged 42 commits into from
Aug 7, 2020

Conversation

smithto1
Copy link
Member

@smithto1 smithto1 commented Jul 16, 2020

Behavioural Changes
.apply()
Calls to self._set_group_selection have been replaced with with _group_selection_context(self): for _agg_general, _make_wrapper, and nth.

Previously these calls to self._set_group_selection created a bug in GroupBy.apply where calling another method before .apply would change the output of .apply. This bug is now fixed.

Tests
One new test is added to check that the output of .apply is constant whether another method is called on the same grouper first.

Two existing tests were actually dependent on the old buggy-behaviour (i.e. they called GroupBy.sum first and then expected that GroupBy.apply(sum) would exclude the index columns from the results). All of these tests have been amended in a manner that enforces the new consistent output format while preserving the existing test.

Both of the copy-pastable examples in the linked bug-reports are fixed.

@pep8speaks
Copy link

pep8speaks commented Jul 16, 2020

Hello @smithto1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 14:14:48 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @TomAugspurger of u would have a look

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, if my understanding on that point is correct.

pandas/tests/groupby/test_function.py Outdated Show resolved Hide resolved
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jul 17, 2020
@smithto1 smithto1 requested a review from TomAugspurger July 17, 2020 15:27
@TomAugspurger
Copy link
Contributor

I'm not going to have a chance to look through this closely today. Given that we're in an RC period and this isn't a regression fix, I think we should target this for 1.1.1.

@simonjayhawkins
Copy link
Member

I'm not going to have a chance to look through this closely today. Given that we're in an RC period and this isn't a regression fix, I think we should target this for 1.1.1.

sgtm

@TomAugspurger TomAugspurger modified the milestones: 1.1, 1.1.1 Jul 28, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm.

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
pandas/tests/groupby/test_apply.py Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.1.1.rst Outdated Show resolved Hide resolved
# be trimmed by implementing cython funcs for more dtypes
pass
else:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, codecov pointed that this is not covered, can you see if you can get a test to lands here (otherwise remove it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a way to reach this line from any public function call. AFAICS anything that would raise a NotImplementedError that comes under the else:, also trips a DataError which catches it before this line.

So I've gone with removing it.

@jreback jreback modified the milestones: 1.1.1, 1.2 Aug 3, 2020
@smithto1
Copy link
Member Author

smithto1 commented Aug 5, 2020

@jreback I think this one is ready to be merged.

@jreback
Copy link
Contributor

jreback commented Aug 6, 2020

lgtm. can you merge master again and ping on green (as merged a few other groupby patches recently).

@jreback jreback merged commit dfa546e into pandas-dev:master Aug 7, 2020
@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

thanks @smithto1 (failure is unrelated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants