API: way to exclude the grouped column with apply #7155

jreback · 2014-05-17T14:41:20Z

http://stackoverflow.com/questions/23709811/best-way-to-sum-group-value-counts-in-pandas/23712433?noredirect=1#comment36441762_23712433
any of these look palatable?

df.groupby('c',as_index='exclude').apply(....)
df.groupby('c')['~c'].apply(...)
df.groupby('c',filter='c')
df.groupby('c',filter=True)

rather than doing a negated selection (df.columns-['c'])

df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
                             'b2':['bb','bc','ad','cd'],\
                             'b3':['cc','cd','cc','ae'],\
                             'c' :['a','b','a','b']})

df.groupby('c')[df.columns - ['c']].apply(lambda x: x.unstack().value_counts())
Out[92]: 
c    
a  cc    2
   bb    1
   ad    1
   ac    1
   aa    1
b  cd    2
   ad    1
   ae    1
   ac    1
   bc    1
dtype: int64

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2023-02-23T16:49:26Z

@rhshadrach some of what you're working on with selected_obj vs obj_with_exclusions might be related?

rhshadrach · 2023-02-23T21:29:47Z

Throughout most of groupby, the grouping column is not included when computing the result. groupby(...).apply is the one exception I know of (maybe there are others?), and here, we first try it with the groupings and if that fails, we try it without the groupings. This makes apply inconsistent with both itself and other methods.

I think we should always excluding the groupings, and plan to put up a deprecation of this as part of 2.x.

topper-123 · 2023-04-05T11:12:25Z

The example in OP doesn't work any more (even adjusted for the - list issue):

In [14]: df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
   ...:                              'b2':['bb','bc','ad','cd'],\
   ...:                              'b3':['cc','cd','cc','ae'],\
   ...:                              'c' :['a','b','a','b']})
   ...:
   ...: df.groupby('c')[df.columns.drop('c')].apply(lambda x: x.unstack().value_counts())
TypeError: Series.name must be a hashable type

This must be a bug?

For reference, I got it to work like this:

In [15]: df[df.columns.drop('c')].groupby(df['c']).apply(lambda x: x.unstack().value_counts())
Out[15]:
c
a  cc    2
   aa    1
   ac    1
   bb    1
   ad    1
b  cd    2
   ac    1
   ad    1
   bc    1
   ae    1
dtype: int64

Which is not a nice API and the first example should work IMO.

topper-123 · 2023-04-05T12:32:02Z

I think it look reasonable to deprecate keeping the grouping keys in the groupby.apply to keep the method similar to the other groupby methods. I'd say in 99,5 % if the time, that is what users would want/expect.

Keeping the deprecation process reasonable can be tricky (avoid warnings all over is a concern). How would we deprecate this?

rhshadrach · 2023-04-05T20:26:43Z

I have a branch to deprecate that I think is all set to go. It looks quite noisy if you look at our tests, but would only warn the user if:

apply with the groupings is successful
There is actually difference between _selected_obj and _obj_with_exclusions

To avoid the warning, users can exclude the groupings, e.g.

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a')

# Raises warning
gb.apply(len)

# No warning
gb[['b']].apply(len)

rhshadrach · 2023-06-15T20:49:55Z

Reopening since #52477 was reverted.

rhshadrach · 2023-06-21T02:23:37Z

xref: #52477 (comment)

As the deprecation was determined to be noisy, I plan to reintroduce the linked PR once 2.1 is released with an option (mode.groupings_in_apply) to allow users to adopt future behavior and silence the warning. It will default to True; setting to False gives future behavior. The option will then be deprecated in 3.x.

@phofl - does this sound reasonable?

This removes a warning raised by pandas when performing a groupby-apply call on a dataframe in noslag.aggregate. This warning is new in pandas 2.2. See also here: pandas-dev/pandas#7155

jreback added this to the 0.14.1 milestone May 17, 2014

jreback added Groupby labels May 17, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

mroeschke added Enhancement and removed API Design labels Apr 11, 2021

mroeschke removed this from the Someday milestone Oct 13, 2022

rhshadrach added Deprecate Functionality to remove in pandas Apply Apply, Aggregate, Transform, Map and removed Enhancement labels Feb 23, 2023

rhshadrach self-assigned this Feb 23, 2023

This was referenced Apr 1, 2023

REF: Use _python_apply_general internally #52342

Merged

REF: Paths through DataFrameGroupBy.agg #52362

Open

topper-123 mentioned this issue Apr 5, 2023

BUG: error when unstacking in DataFrame.groupby.apply #52444

Closed

3 tasks

rhshadrach mentioned this issue Apr 6, 2023

DEPR: DataFrameGroupBy.apply operating on the group keys #52477

Merged

5 tasks

mroeschke closed this as completed in #52477 Apr 12, 2023

rhshadrach reopened this Jun 15, 2023

rhshadrach mentioned this issue Jun 15, 2023

ENH: add columns attribute to DataFrameGroupBy #53583

Closed

3 tasks

rhshadrach mentioned this issue Jul 15, 2023

Many groupby tests depend on a bug in DataFrameGroupBy.apply #28549

Closed

rhshadrach mentioned this issue Sep 2, 2023

DEPR: DataFrameGroupBy.apply operating on the group keys #54950

Merged

5 tasks

mroeschke closed this as completed in #54950 Sep 7, 2023

msrasmussen mentioned this issue Jan 25, 2024

Fix a couple of GroupBy.apply() deprecations ARM-software/lisa#2183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: way to exclude the grouped column with apply #7155

API: way to exclude the grouped column with apply #7155

jreback commented May 17, 2014

jbrockmendel commented Feb 23, 2023

rhshadrach commented Feb 23, 2023 •

edited

Loading

topper-123 commented Apr 5, 2023 •

edited

Loading

topper-123 commented Apr 5, 2023

rhshadrach commented Apr 5, 2023

rhshadrach commented Jun 15, 2023

rhshadrach commented Jun 21, 2023

API: way to exclude the grouped column with apply #7155

API: way to exclude the grouped column with apply #7155

Comments

jreback commented May 17, 2014

jbrockmendel commented Feb 23, 2023

rhshadrach commented Feb 23, 2023 • edited Loading

topper-123 commented Apr 5, 2023 • edited Loading

topper-123 commented Apr 5, 2023

rhshadrach commented Apr 5, 2023

rhshadrach commented Jun 15, 2023

rhshadrach commented Jun 21, 2023

rhshadrach commented Feb 23, 2023 •

edited

Loading

topper-123 commented Apr 5, 2023 •

edited

Loading