PERF: GroupBy.mean
orders of magnitude slower for pyarrow dtypes. (2.0.3)
#54207
Closed
2 of 3 tasks
Labels
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
It seems that this is not reproducible for everyone, but it happened to me on 2 separate machines as well as in google colab.
If anyone can test and see if it reproduces that would be great.
%timeit df_numpy.groupby("time").mean();
: 4.02 ms ± 897 µs%timeit df_arrow.groupby("time").mean();
: 10.3 s ± 661 msIssue Description
groupby.mean
is orders of magnitude slower when using arrow data types.Related:
agg
is an order of magnitude slower withpyarrow
dtypes #54065Expected Behavior
arrow groupby should never be slower than converting to numpy, aggregating and converting back.
Installed Versions
The text was updated successfully, but these errors were encountered: