-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: way to exclude the grouped column with apply #7155
Comments
@rhshadrach some of what you're working on with selected_obj vs obj_with_exclusions might be related? |
Throughout most of groupby, the grouping column is not included when computing the result. I think we should always excluding the groupings, and plan to put up a deprecation of this as part of 2.x. |
The example in OP doesn't work any more (even adjusted for the In [14]: df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
...: 'b2':['bb','bc','ad','cd'],\
...: 'b3':['cc','cd','cc','ae'],\
...: 'c' :['a','b','a','b']})
...:
...: df.groupby('c')[df.columns.drop('c')].apply(lambda x: x.unstack().value_counts())
TypeError: Series.name must be a hashable type This must be a bug? For reference, I got it to work like this: In [15]: df[df.columns.drop('c')].groupby(df['c']).apply(lambda x: x.unstack().value_counts())
Out[15]:
c
a cc 2
aa 1
ac 1
bb 1
ad 1
b cd 2
ac 1
ad 1
bc 1
ae 1
dtype: int64 Which is not a nice API and the first example should work IMO. |
I think it look reasonable to deprecate keeping the grouping keys in the groupby.apply to keep the method similar to the other groupby methods. I'd say in 99,5 % if the time, that is what users would want/expect. Keeping the deprecation process reasonable can be tricky (avoid warnings all over is a concern). How would we deprecate this? |
I have a branch to deprecate that I think is all set to go. It looks quite noisy if you look at our tests, but would only warn the user if:
To avoid the warning, users can exclude the groupings, e.g.
|
Reopening since #52477 was reverted. |
xref: #52477 (comment) As the deprecation was determined to be noisy, I plan to reintroduce the linked PR once 2.1 is released with an option ( @phofl - does this sound reasonable? |
This removes a warning raised by pandas when performing a groupby-apply call on a dataframe in noslag.aggregate. This warning is new in pandas 2.2. See also here: pandas-dev/pandas#7155
http://stackoverflow.com/questions/23709811/best-way-to-sum-group-value-counts-in-pandas/23712433?noredirect=1#comment36441762_23712433
any of these look palatable?
df.groupby('c',as_index='exclude').apply(....)
df.groupby('c')['~c'].apply(...)
df.groupby('c',filter='c')
df.groupby('c',filter=True)
rather than doing a negated selection (
df.columns-['c']
)The text was updated successfully, but these errors were encountered: