Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: specification of functions in .agg #8593

Closed
1 of 3 tasks
jreback opened this issue Oct 21, 2014 · 3 comments
Closed
1 of 3 tasks

API: specification of functions in .agg #8593

jreback opened this issue Oct 21, 2014 · 3 comments
Labels
Apply Apply, Aggregate, Transform, Map Closing Candidate May be closeable, needs more eyeballs Enhancement

Comments

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

From SO

In [100]: df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
                   'B': [10, 12, 10, 25, 10, 12],
                   'C': [100, 102, 100, 250, 100, 102]})

In [101]: df
Out[101]: 
        A   B    C
0  group1  10  100
1  group1  12  102
2  group2  10  100
3  group2  25  250
4  group3  10  100
5  group3  12  102

In [102]: df.groupby('A').agg(['mean',lambda x: x.iloc[1]])
Out[102]: 
           B               C          
        mean  <lambda>  mean  <lambda>
A                                     
group1  11.0        12   101       102
group2  17.5        25   175       250
group3  11.0        12   101       102

Would be nice to be able to use lambda x: x.nth(1) or maybe 'nth(1)'
In [103]: df.groupby('A').agg(['mean','nth(1')])

@jreback
Copy link
Contributor Author

jreback commented May 12, 2015

@shoyer heres my use-case. This is a specification engine (similar in concept to pd.Grouper), to enable one to easily specify ordering of output summary values (e.g. no need for dict-like), and to bind meta-data to the function, (mostly name), w/o too much boilerplate). Should be fully back-compat.

S = pd.Summary

# 8593
df.groupby('A').agg([S('mean',name='nth(1)']))

# 10100
dfAggregated = grouped.agg([np.mean,np.std,S(lambda v: v.mean()/v.max(),name='normalized mean'])

# 9052
df.groupby('B').agg({ 'A': [S('mean',name='mean1'),S('median',name='med1')],
                      'C': [S('mean',name='mean2'),S('median',name='med2')],
                    })

# maybe could also do
df.groupby('B').agg([ S('mean',column='A',name='mean1'),
                      S('median',column='A',name='med1'),
                      S('mean',columns='C',name='mean2'),
                      S('median',columns='C',name='med2')   
              ])

@jreback jreback modified the milestones: Next Major Release, High Level Issue Tracking Sep 24, 2017
@TomAugspurger TomAugspurger removed the Master Tracker High level tracker for similar issues label Jul 6, 2018
@TomAugspurger TomAugspurger removed this from the High Level Issue Tracking milestone Jul 6, 2018
@jbrockmendel jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Dec 1, 2019
@RylanSchaeffer
Copy link

I know this isn't the place, but how does one use nth() with a groupby's .agg() method?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Feb 11, 2023
@jbrockmendel
Copy link
Member

Dicussed on today's dev call, main comments were 1) does NamedAgg cover some of this? 2) "No" to doing eval on "nth(1)", 3) pd.Grouper is one of the harder-to-reason-about parts of the codebase and we don't want more things like it. Closing as no action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Closing Candidate May be closeable, needs more eyeballs Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants