Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Unify API of groupby.sum and sum #46635

Closed
phil20686 opened this issue Apr 4, 2022 · 9 comments
Closed

ENH: Unify API of groupby.sum and sum #46635

phil20686 opened this issue Apr 4, 2022 · 9 comments
Labels
API - Consistency Internal Consistency of API/Behavior Duplicate Report Duplicate issue or pull request Enhancement Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@phil20686
Copy link

phil20686 commented Apr 4, 2022

Is your feature request related to a problem?

The level keyword in sum is marked as deprecated, but the suggested replacement with groupby.sum does not support skipna, therefore, some behaviours are impossible to replicate easily.

Describe the solution you'd like

grouby.sum should accept the skipna=False keyword,

API breaking implications

Currently the API is inconsistent, since
df.sum(axis=1, level=0, skipna=False) has no equivalent since df.groupby(axis=1, level=0).sum(skipna=False) is not available.

Describe alternatives you've considered

One can of course do: df.groupby(axis=1, level=0).apply(lambda x : x.sum(axis=1, skipna=False)) but this cannot be the intended use.

Additional context

image

# Your code here, if applicable
@phil20686 phil20686 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 4, 2022
@samukweku
Copy link
Contributor

@phil20686 kindly add an example to explain better

@simonjayhawkins
Copy link
Member

@phil20686 kindly add an example to explain better

and if you could add some references to support the discussion and other readers that'll be great too. e.g. the deprecation and alternatives, where this was discussed etc.

and update the issue title to be more informative too!

@MarcoGorelli MarcoGorelli added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 5, 2022
@rhshadrach
Copy link
Member

I believe df.sum(axis=1, level=0, skipna=True) should always be equivalent to df.groupby(axis=1, level=0).sum(); can you provide an example where this is not the case?

@rhshadrach rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 5, 2022
@phil20686
Copy link
Author

phil20686 commented Apr 8, 2022

I meant false sorry, I have added an example. There seems to be no way to eradicate this future warning without changing the behaviour, except resorting to something unpleasant like using apply instead of sum.

@phil20686 phil20686 changed the title ENH: ENH: Unify API of groupby.sum and sum Apr 8, 2022
@rhshadrach
Copy link
Member

@phil20686 - can you add a plaintext example rather than a screenshot; this allows devs to copy and paste rather than having to retype the entire code.

@rhshadrach
Copy link
Member

One can of course do: df.groupby(axis=1, level=0).apply(lambda x : x.sum(axis=1, skipna=False)) but this cannot be the intended use.

This is just about how pandas implements df.sum(level=0); only difference is that pandas uses .agg instead of .apply.

@rhshadrach
Copy link
Member

I'm +0 on adding skipna to groupby. The deprecated level argument does make users supply a replacement that is a bit more verbose, but just as efficient, as pandas is using today. While certainly not great, I also think it's not terrible. If we do go this route, then it shouldn't just be sum of course, but (at least) all ops of DataFrame that have skipna.

It seems to me to be desirable to add this as a fully baked in feature of the groupby code, rather than bolted on using agg/apply, but I'm not sure how difficult that might be.

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action API - Consistency Internal Consistency of API/Behavior and removed Needs Info Clarification about behavior needed to assess issue labels Apr 8, 2022
@jreback
Copy link
Contributor

jreback commented Apr 9, 2022

i am +1 in unifying signatures between aggregators

this is a footgun that we can easily prevent

being efficient for a non default keyword doesn't matter that much. api consistency is much more important

@simonjayhawkins
Copy link
Member

and if you could add some references to support the discussion and other readers that'll be great too. e.g. the deprecation and alternatives, where this was discussed etc.

the deprecation was discussed in #39983 and implemented in #40869 and introduced in pandas 1.3

the issue here has already been communicated (with a copy and paste-able code sample) in #39983 (comment)

It is too late now that we are on pandas 1.4.x to undo the deprecation until an appropriate less verbose alternative is available.

there is already an issue to enable skipna on groupby reduction ops #15675, so I think safe to close this as a duplicate.

@phil20686 Thanks for the report. please continue to contribute to the discussion at #15675 instead.

@simonjayhawkins simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Duplicate Report Duplicate issue or pull request Enhancement Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants