Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow string aggs for
dask_cudf.CudfDataFrameGroupBy.aggregate
(#10222
) I noticed that `CudfDataFrameGroupBy.aggregate` doesn't actually support passing aggregations as strings, for example something like ```python import cudf import dask_cudf gdf = cudf.DataFrame({'id4': 4*list(range(6)), 'id5': 4*list(reversed(range(6))), 'v3': 6*list(range(4))}) gddf = dask_cudf.from_cudf(gdf, npartitions=5) gddf.groupby("id4").agg("mean") ``` Would actually end up using the upstream `aggregate` implementation. This is because: - `CudfDataFrameGroupBy.aggregate` does not convert string aggs to a dict before calling `_is_supported` on them - `_is_supported` only handles list / dict aggs, returning false otherwise I've resolved this by adding string support to `_is_supported`, and moving the conversion of aggs to the internal `groupby_agg`. It looks like this is exposing some failures for `first` and `last` groupby aggs, as tests that were originally using upstream Dask to compute these aggregations (I assume accidentally since these aggregations are listed as supported) are now using dask-cuDF and getting the wrong result. Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10222
- Loading branch information