Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support grouping by a Series in dask_cudf groupby #9020

Closed
shwina opened this issue Aug 11, 2021 · 0 comments · Fixed by #9022
Closed

[FEA] Support grouping by a Series in dask_cudf groupby #9020

shwina opened this issue Aug 11, 2021 · 0 comments · Fixed by #9022
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented Aug 11, 2021

Grouping by an external Series doesn't always work in dask_cudf:

In [34]: df = dask_cudf.from_cudf(cudf.DataFrame({'a': [1, 2, 3, 4, 5]}), npartitions=1)
In [35]: s = dask_cudf.from_cudf(cudf.Series([1, 1, 1, 2, 2], name='id'), npartitions=1)
In [36]: df.groupby([s]).agg(["sum"]).compute() # error

...

ValueError: Metadata inference failed in `eq`.

Original error is below:
------------------------
TypeError("cannot broadcast <class 'str'>")

Although, for very simple aggregations, it does -- note how I'm not wrapping "sum" in a list:

In [37]: df.groupby([s]).agg("sum").compute()
Out[37]:
    a
id
2   9
1   6
@shwina shwina added feature request New feature or request Needs Triage Need team to review and classify labels Aug 11, 2021
@shwina shwina added dask-cudf Python Affects Python cuDF API. and removed dask-cudf Needs Triage Need team to review and classify labels Aug 11, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 13, 2021
Fixes: #9020 

This PR enables fallback to upstream `dask` when the groupby operation is performed by a list of `Series` objects.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #9022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants