Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor groupby to rely less on storing keys as Index objects #12037

Open
shwina opened this issue Nov 1, 2022 · 0 comments
Open

Refactor groupby to rely less on storing keys as Index objects #12037

shwina opened this issue Nov 1, 2022 · 0 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented Nov 1, 2022

#11792 introduces the ability to group on list columns. In the future, we can expect grouping by, e.g., structs and other types that are not supported by Pandas.

In #6932, we made the decision not to support creating an Index with elements of type list.

Unfortunately, our groupby internals rely heavily on being able to store the key columns of a groupby as an Index. In particular, the internal _Grouping.keys method is heavily used.

We should rely less on storing keys as Index objects, which will make it much easier to support grouping by lists and structs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

2 participants