Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Series unique should return an array rather than a dataframe #8175

Open
beckernick opened this issue May 6, 2021 · 3 comments
Open
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

Pandas's unique method returns an array, while ours returns a Series.

import cudf
print(cudf.__version__)
​
df = cudf.DataFrame({
    "a": [0,1,2,0,1,2,0]
})
type(df['a'].unique()), type(df['a'].to_pandas().unique())
0.20.0a+248.g7623f3978b
(cudf.core.series.Series, numpy.ndarray)
@beckernick beckernick added bug Something isn't working Python Affects Python cuDF API. labels May 6, 2021
@kkraus14
Copy link
Collaborator

kkraus14 commented May 6, 2021

Hmm, this is problematic for us in handling things like strings or decimal values which can't be supported by arrays as of now. @shwina is scoping building cudf.arrays similar to pandas.arrays which would potentially give a path forward in the future.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapids-bot bot pushed a commit that referenced this issue Aug 25, 2023
Partially addresses #8175
This PR makes changes to `Series.unique`, where a cupy array is returned to match `pd.Series.unique` where a numpy array is returned.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13959
@vyasr
Copy link
Contributor

vyasr commented May 16, 2024

I'm not sure what we should do about this. I don't think extension arrays are on our roadmap for any time soon. @galipremsagar WDYT? We could match the type for types supported by cupy, but returning a different object based on the data type seems even worse than the status quo. Fundamentally I don't know that we'll ever be able to solve this since we have other types that cupy will probably never support (list and struct, for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants