Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many groupby tests depend on a bug in DataFrameGroupBy.apply #28549

Closed
christopherzimmerman opened this issue Sep 20, 2019 · 3 comments
Closed
Labels
Bug Groupby Testing pandas testing functions or related to the test suite

Comments

@christopherzimmerman
Copy link
Contributor

christopherzimmerman commented Sep 20, 2019

Discovered while fixing an issue in #28541

A large number of tests relating to DataFrameGroupBy.apply depend on a bug in apply where the grouped column is not removed from the returned DataFrame.

Here are some examples:

There are around ~20 cases of this.


In general, they expect the result of something like this:

df = pd.DataFrame({"key": [1, 1, 1, 2, 2, 2, 3, 3, 3], "value": range(9)})
df.groupby('key').apply(lambda x: x.sum())

To include the "key" column in the result, which should not be the case.

     key  value
key
1      3      3
2      6     12
3      9     21

The underlying issue with apply is a straightforward fix, all that needs to be done is change this return to use the _group_selection_context, but the PR will have to include updates to many tests.

@christopherzimmerman christopherzimmerman changed the title Many tests depend on bug in apply Many groupby tests depend on a bug in DataFrameGroupBy.apply Sep 20, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

I think worth double checking the comprehensive functionality of as_index, i.e. making sure it works consistently for the reduction functions to put the groupings in the index when True and put in as columns when False

Not sure off the top of my head how that is covered in tests but would be great if you can take a look while reviewing all of this

@jbrockmendel
Copy link
Member

Not sure how much this overlaps, but I'm noticing in _aggregate_frame the for name, data in self: is yielding data DataFrames with more columns than I expect. I think this is explained by _GroupBy.__iter__ using self.obj instead of self._obj_with_exclusions. If we change this, a handful of tests are broken, but most of thsoe are for merge ops

@jreback jreback added this to the Contributions Welcome milestone Jan 1, 2020
@mroeschke mroeschke added the Bug label May 22, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach
Copy link
Member

Closing as a duplicate of #7155.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
6 participants