-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cumsum sums the groupby column #5614
Comments
What would be the expected output? Something like this?:
And should it then also be the case for |
@jorisvandenbossche RE the index of cumcount, possibly yes it should respect as_index... I think it's debatable if this would ever be desired though... the main problem however is it's slow (I don't think efficient way to append index to index to make MI) and this is the default. I had thought I had posted about this somewhere but can't find issue... I think so, though like I say I think we need to have a discussion about as_index (there are at least three different ways used in groupby atm)... I had a partially filled in issue about it from a week or so ago... :s will look at it again after the weekend and try to post it. It's kinda a mess and some conventions are of dubious value (e.g. that of head) |
Yes, you did :-) Here: #4646 (comment) |
I think should add some UserWarnings in 0.14 about this kind of behaviour, link to #5755. |
@hayd you have anything in the works about this? push to 0.15 otherwise |
I think I do, hope to get in the week. |
ping! |
I think this is closed by #7000, maybe just add a test? |
Weirdly with the above example we don't have A as the index! In [4]: df = pd.DataFrame([[1, 2, np.nan], [1, np.nan, 9], [3, 4, 9]], columns=['A', 'B', 'C'])
In [5]: g = df.groupby('A')
In [6]: g.cumsum() # should have A as index
Out[6]:
B C
0 2 NaN
1 NaN 9
2 4 9
In [7]: g = df.groupby('A', as_index=False) # this is correct
In [8]: g.cumsum()
Out[8]:
A B C
0 1 2 NaN
1 2 NaN 9
2 3 4 9 |
Ah wait, this is a feature! Coool! |
hmm...the index should have a named index (as A)...let me fix |
@jreback I'm not so sure, what are you changing? I think this is good as is! |
I think this should be this (happens to be the same as sum in this case)
|
Here's a more realistic example
Here's the current result
|
Following our rules, And I would say it is a transformer, and then it is 'correct' to drop the grouper column. At least this is also what transform does:
|
@jorisvandenbossche you are right....ok..marked it as some additional tests needed in any event (simply to validate this expectation) |
It shouldn't sum the groupby'd col (in fact index col should be the index, if groupby as_index).
Nature of it being dispatch. Should fix up for 0.14 possibly along with some other whitelisted groupby functions.
The text was updated successfully, but these errors were encountered: