API: pass `ddof` down to Cythonised `std` / `var` in `groupby.agg` #60591

MarcoGorelli · 2024-12-20T14:21:33Z

Say I want to do a groupby and perform various aggregations, e.g. I want to find mean and std of b. Easy:

import pandas as pd

df = pd.DataFrame({'a': [1,1,2], 'b': [4,5,6]})
df.groupby('a').agg({'b': ['mean', 'std']})

What if I want to do the same with ddof=0? If was computing a single aggregation, I could do:

print(df.groupby('a')['b'].std(ddof=0))

and that uses the Cythonized path.

However, I think the current pandas API doesn't allow a way of passing ddof to 'std' when used in .agg. The workaround often suggested in StackOverflow is (😭 ):

print(df.groupby('a').agg({'b': ['mean', lambda x: np.std(x)]}))

but that'll evade the Cythonized path, which is a missed opportunity

The text was updated successfully, but these errors were encountered:

rhshadrach · 2024-12-20T14:37:10Z

I'm supportive of creating a way to do this. We currently have NamedAgg, and could add a kwargs argument here. This would not support the OP of acting on all columns. We could also expand NamedAgg to allow acting on multiple and possibly all columns. Or we could introduce a new class, e.g. pd.Op(name, kwargs) specifically for the purpose of acting on all columns.

More generally, there has been a desire for some time to add an expression system to pandas. If we also take the idea of Polars' selectors, then pd.selectors.all().std(ddof=0) becomes possible. If we choose to go this route, then I would be opposed to making other short-term enhancements to support this.

In any case, I think we should strive for consistency of all UDF methods (apply, agg, transform, filter, map) in Series, DataFrame, GroupBy, Window, Resample.

MarcoGorelli added Groupby API Design labels Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: pass `ddof` down to Cythonised `std` / `var` in `groupby.agg` #60591

API: pass `ddof` down to Cythonised `std` / `var` in `groupby.agg` #60591

MarcoGorelli commented Dec 20, 2024 •

edited

Loading

rhshadrach commented Dec 20, 2024 •

edited

Loading

API: pass ddof down to Cythonised std / var in groupby.agg #60591

API: pass ddof down to Cythonised std / var in groupby.agg #60591

Comments

MarcoGorelli commented Dec 20, 2024 • edited Loading

rhshadrach commented Dec 20, 2024 • edited Loading

API: pass `ddof` down to Cythonised `std` / `var` in `groupby.agg` #60591

API: pass `ddof` down to Cythonised `std` / `var` in `groupby.agg` #60591

MarcoGorelli commented Dec 20, 2024 •

edited

Loading

rhshadrach commented Dec 20, 2024 •

edited

Loading