-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate
#10222
Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate
#10222
Conversation
This PR has been labeled |
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10222 +/- ##
================================================
+ Coverage 86.28% 86.32% +0.03%
================================================
Files 144 144
Lines 22654 22656 +2
================================================
+ Hits 19548 19558 +10
+ Misses 3106 3098 -8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comments -- looks good to me!
@gpucibot merge |
I noticed that
CudfDataFrameGroupBy.aggregate
doesn't actually support passing aggregations as strings, for example something likeWould actually end up using the upstream
aggregate
implementation. This is because:CudfDataFrameGroupBy.aggregate
does not convert string aggs to a dict before calling_is_supported
on them_is_supported
only handles list / dict aggs, returning false otherwiseI've resolved this by adding string support to
_is_supported
, and moving the conversion of aggs to the internalgroupby_agg
.It looks like this is exposing some failures for
first
andlast
groupby aggs, as tests that were originally using upstream Dask to compute these aggregations (I assume accidentally since these aggregations are listed as supported) are now using dask-cuDF and getting the wrong result.