Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGI: groupby issue with user function with mutation, and duplicate indices #5758

Closed
jreback opened this issue Dec 20, 2013 · 1 comment
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request Error Reporting Incorrect or improved errors from pandas Groupby

Comments

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

Only happens when the user function mutates the input

Originally from: http://stackoverflow.com/questions/20691168/pandas-apply-to-data-frame-groupby/20705226#20705226

Could be auto-fixed or maybe just a better error report to the user (about the dups)

In [40]: df = DataFrame(dict(A = ['foo','foo','bar','bar'], B = np.random.randn(4)),index=[1,1,2,2])

In [41]: df
Out[41]: 
     A         B
1  foo  0.971425
1  foo -1.151693
2  bar  1.265031
2  bar -0.219011

[4 rows x 2 columns]

In [42]: def f(x):                                                
    x['std'] = x['B'].std()
    return x
   ....: 

Cannot straight perform the apply

In [44]: df.groupby('A').apply(f)
ValueError: cannot reindex from a duplicate axis

By using unique indices everything is ok (so straightforward to fix)

In [45]: df.reset_index().groupby('A').apply(f).set_index('index')
Out[45]: 
         A         B       std
index                         
1      foo  0.971425  1.501271
1      foo -1.151693  1.501271
2      bar  1.265031  1.049376
2      bar -0.219011  1.049376

[4 rows x 3 columns]
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@TomAugspurger TomAugspurger modified the milestones: Next Major Release, No action Jan 31, 2018
@TomAugspurger
Copy link
Contributor

Duplicate of #19437

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jan 31, 2018
@TomAugspurger TomAugspurger marked this as a duplicate of #19437 Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests

2 participants