Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/TST: verify that groupby apply with a column aggregation does not return the column #7002

Closed
jreback opened this issue Apr 29, 2014 · 6 comments · Fixed by #34647
Closed

Comments

@jreback
Copy link
Contributor

jreback commented Apr 29, 2014

related #7000

In [1]:  df = DataFrame({'foo1' : ['one', 'two', 'two', 'three', 'one', 'two'],
                                      'foo2' : np.random.randn(6)})

In [2]: df
Out[2]: 
    foo1      foo2
0    one  1.006666
1    two  0.002063
2    two  1.507785
3  three  1.865921
4    one  0.141202
5    two -1.079792

[6 rows x 2 columns]
In [3]: df.groupby('foo1').mean()
Out[3]: 
           foo2
foo1           
one    0.573934
three  1.865921
two    0.143352

[3 rows x 1 columns]

In [4]: df.groupby('foo1').apply(lambda x: x.mean())
Out[4]: 
           foo2
foo1           
one    0.573934
three  1.865921
two    0.143352

[3 rows x 1 columns]

This should return the foo1 column as well

[6]: df.groupby('foo1',as_index=False).apply(lambda x: x.mean())
Out[6]: 
       foo2
0  0.573934
1  1.865921
2  0.143352

[3 rows x 1 columns]

In [7]: df.groupby('foo1',as_index=False).mean()
Out[7]: 
    foo1      foo2
0    one  0.573934
1  three  1.865921
2    two  0.143352

[3 rows x 2 columns]

@jreback jreback added this to the 0.15.0 milestone Apr 29, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jonathanlb
Copy link

Extending the example to
df.groupby('foo1').apply(lambda x: x.sum())
produces a DataFrame with column foo1 values oneone three twotwotwo

Should that behavior be preserved?

@OmerJog
Copy link

OmerJog commented Jan 22, 2019

Should that behavior be preserved?

Why not?

@drburrow
Copy link

If nobody is currently working on this, I'd like to look more into it.

@mroeschke mroeschke removed the Testing pandas testing functions or related to the test suite label May 22, 2020
@fangchenli
Copy link
Member

I guess this could be closed.

df = pd.DataFrame({'foo1': ['one', 'two', 'two', 'three', 'one', 'two'],
  ...:                    'foo2': np.random.randn(6)})

df
Out[4]: 
    foo1      foo2
0    one  0.404196
1    two -0.484634
2    two  1.033869
3  three -0.368001
4    one -2.506380
5    two  0.807768

df.groupby('foo1',as_index=False).apply(lambda x: x.mean())
Out[5]: 
    foo1      foo2
0    one -1.051092
1  three -0.368001
2    two  0.452334

df.groupby('foo1',as_index=False).mean()
Out[6]: 
    foo1      foo2
0    one -1.051092
1  three -0.368001
2    two  0.452334

@jreback
Copy link
Contributor Author

jreback commented Jun 8, 2020

this was recently patched and we should have a test for it

if you like to check (otherwise a PR with this test case would be ok)

fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
@fangchenli
Copy link
Member

Sorry for this many commits... It was my first PR.

fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 8, 2020
@jreback jreback removed this from the Contributions Welcome milestone Jun 8, 2020
@jreback jreback added this to the 1.1 milestone Jun 8, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 9, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 9, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 9, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants