Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: way to exclude the grouped column with apply #7155

Closed
jreback opened this issue May 17, 2014 · 7 comments · Fixed by #52477 or #54950
Closed

API: way to exclude the grouped column with apply #7155

jreback opened this issue May 17, 2014 · 7 comments · Fixed by #52477 or #54950
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Groupby

Comments

@jreback
Copy link
Contributor

jreback commented May 17, 2014

http://stackoverflow.com/questions/23709811/best-way-to-sum-group-value-counts-in-pandas/23712433?noredirect=1#comment36441762_23712433
any of these look palatable?

df.groupby('c',as_index='exclude').apply(....)
df.groupby('c')['~c'].apply(...)
df.groupby('c',filter='c')
df.groupby('c',filter=True)

rather than doing a negated selection (df.columns-['c'])

df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
                             'b2':['bb','bc','ad','cd'],\
                             'b3':['cc','cd','cc','ae'],\
                             'c' :['a','b','a','b']})

df.groupby('c')[df.columns - ['c']].apply(lambda x: x.unstack().value_counts())
Out[92]: 
c    
a  cc    2
   bb    1
   ad    1
   ac    1
   aa    1
b  cd    2
   ad    1
   ae    1
   ac    1
   bc    1
dtype: int64
@jreback jreback added this to the 0.14.1 milestone May 17, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

@rhshadrach some of what you're working on with selected_obj vs obj_with_exclusions might be related?

@rhshadrach
Copy link
Member

rhshadrach commented Feb 23, 2023

Throughout most of groupby, the grouping column is not included when computing the result. groupby(...).apply is the one exception I know of (maybe there are others?), and here, we first try it with the groupings and if that fails, we try it without the groupings. This makes apply inconsistent with both itself and other methods.

I think we should always excluding the groupings, and plan to put up a deprecation of this as part of 2.x.

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Apply Apply, Aggregate, Transform, Map and removed Enhancement labels Feb 23, 2023
@rhshadrach rhshadrach self-assigned this Feb 23, 2023
@topper-123
Copy link
Contributor

topper-123 commented Apr 5, 2023

The example in OP doesn't work any more (even adjusted for the - list issue):

In [14]: df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
   ...:                              'b2':['bb','bc','ad','cd'],\
   ...:                              'b3':['cc','cd','cc','ae'],\
   ...:                              'c' :['a','b','a','b']})
   ...:
   ...: df.groupby('c')[df.columns.drop('c')].apply(lambda x: x.unstack().value_counts())
TypeError: Series.name must be a hashable type

This must be a bug?

For reference, I got it to work like this:

In [15]: df[df.columns.drop('c')].groupby(df['c']).apply(lambda x: x.unstack().value_counts())
Out[15]:
c
a  cc    2
   aa    1
   ac    1
   bb    1
   ad    1
b  cd    2
   ac    1
   ad    1
   bc    1
   ae    1
dtype: int64

Which is not a nice API and the first example should work IMO.

@topper-123
Copy link
Contributor

I think it look reasonable to deprecate keeping the grouping keys in the groupby.apply to keep the method similar to the other groupby methods. I'd say in 99,5 % if the time, that is what users would want/expect.

Keeping the deprecation process reasonable can be tricky (avoid warnings all over is a concern). How would we deprecate this?

@rhshadrach
Copy link
Member

I have a branch to deprecate that I think is all set to go. It looks quite noisy if you look at our tests, but would only warn the user if:

  • apply with the groupings is successful
  • There is actually difference between _selected_obj and _obj_with_exclusions

To avoid the warning, users can exclude the groupings, e.g.

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a')

# Raises warning
gb.apply(len)

# No warning
gb[['b']].apply(len)

@rhshadrach
Copy link
Member

Reopening since #52477 was reverted.

@rhshadrach
Copy link
Member

xref: #52477 (comment)

As the deprecation was determined to be noisy, I plan to reintroduce the linked PR once 2.1 is released with an option (mode.groupings_in_apply) to allow users to adopt future behavior and silence the warning. It will default to True; setting to False gives future behavior. The option will then be deprecated in 3.x.

@phofl - does this sound reasonable?

PhilippVerpoort added a commit to PhilippVerpoort/posted that referenced this issue Jun 17, 2024
This removes a warning raised by pandas when performing a groupby-apply
call on a dataframe in noslag.aggregate. This warning is new in pandas
2.2. See also here:
pandas-dev/pandas#7155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Groupby
Projects
None yet
6 participants