Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: NamedAgg support for time windows/Expanding #34685

Closed
ddofer opened this issue Jun 10, 2020 · 3 comments
Closed

ENH: NamedAgg support for time windows/Expanding #34685

ddofer opened this issue Jun 10, 2020 · 3 comments
Labels
Enhancement Window rolling, ewma, expanding

Comments

@ddofer
Copy link

ddofer commented Jun 10, 2020

Is your feature request related to a problem?

I am using aggregations over data with groups over time, to create features. I had hoped to use the "new" namedAgg functionality to efficiently calculate and rename the many feature columns.
I find that named Aggregations only works with the "default"groupby.agg, when used with a rolling window or expanding, it's not supported.
(I note that the functions documentation doesn't mention this, the named agg is mainly in the general documentation, not method-level. i.e the groupby.agg method's documentation doesn't mention or demonstrate this functionality at all).

By "named aggregations" I refer to the functionality:
animals.groupby("kind").agg(**min_height=('height', 'min')**)

Describe the solution you'd like

Expand named aggregation support (NamedAgg) to the groupby aggregation used in expanding.aggregate and rolling.aggregate

API breaking implications

Should not affect it. Seems 1:1.

Additional context

Example usage/errors:

df_actions.set_index("age").groupby('uuid').rolling(30).agg(unique_actions=('action', 'nunique'), 
                                total_actions=('counter', 'sum'))

TypeError: aggregate() missing 1 required positional argument: 'func'

When used without the window, we get the benefit of namedAggs. (The real code has many more columns and transformations and the columns the features are calculated in are dynamic, so setting a list of column names to use is not desirable. Additionally, data is time-sorted):

df_actions.set_index("age").groupby('uuid').agg(unique_actions=('action', 'nunique'), 
                               total_actions=('counter', 'sum'))

>>>

  | unique_actions | total_actions

0 | 0.0
0 | 0.0
...
@ddofer ddofer added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 10, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jun 10, 2020

Thanks @ddofer

Is this the same as #32803?

@ddofer
Copy link
Author

ddofer commented Jun 10, 2020

Thanks @ddofer

Is this the same as #32803?

It definetely overlaps. Not sure if it's a duplicate, since that is refering a different specific usage

@TomAugspurger TomAugspurger added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 12, 2020
@mroeschke
Copy link
Member

I think this is a duplicate of #28333, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

4 participants