Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support groupby collect agg on struct columns #8520

Open
ayushdg opened this issue Jun 15, 2021 · 9 comments
Open

[FEA] Support groupby collect agg on struct columns #8520

ayushdg opened this issue Jun 15, 2021 · 9 comments
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@ayushdg
Copy link
Member

ayushdg commented Jun 15, 2021

Is your feature request related to a problem? Please describe.
Ability to create a list of struct columns using the collect agg via groupby

Describe the solution you'd like

df = cudf.DataFrame(
    {
        'a':['aa','aa','cc'],
        'd':[{"b": '1', "c": "one"}, {"b": '2', "c": "two"}, {"b": '3', "c": "one"}]
     }
)

df.groupby('a').collect()

	d
a	
aa	[{'b': '1', 'c': 'one'}, {'b': '2', 'c': 'two'}]
cc	[{'b': '3', 'c': 'one'}]

Describe alternatives you've considered
N/A

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

@ayushdg ayushdg added feature request New feature or request Needs Triage Need team to review and classify labels Jun 15, 2021
@quasiben
Copy link
Member

I thought this is already supported:

In [3]: df = cudf.DataFrame(
   ...:     {
   ...:         'a':['aa','aa','cc'],
   ...:         'd':[{"b": '1', "c": "one"}, {"b": '2', "c": "two"}, {"b": '3', "c": "one"}]
   ...:      }
   ...: )

In [4]: df.groupby('a').collect()
   ...:
Out[4]:
                                                   d
a
aa  [{'0': '1', '1': 'one'}, {'0': '2', '1': 'two'}]
cc                          [{'0': '3', '1': 'one'}]

@ayushdg
Copy link
Member Author

ayushdg commented Jun 15, 2021

Based on the discussion in #8474 it seems like it was not supported but working anyway which was resolved in #8499

@quasiben
Copy link
Member

Are you ok if we close ?

@ayushdg
Copy link
Member Author

ayushdg commented Jun 15, 2021

@quasiben Sorry for not clarifying further. The resolution in #8499 was disallowing aggregations on Struct Columns, so this is no longer supported.

@beckernick
Copy link
Member

beckernick commented Jul 12, 2021

It sounds like this was disallowed because it was erroneously going down a codepath meant for string aggregations (and thus potentially brittle when used with structs/lists).

@ayushdg is the expectation that these operations would behave as before, but by going down a well-tested codepath?

@beckernick beckernick added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jul 12, 2021
@ayushdg
Copy link
Member Author

ayushdg commented Jul 12, 2021

@ayushdg is the expectation that these operations would behave as before, but by going down a well-tested codepath?

Yup, that's the expectation.

@shwina
Copy link
Contributor

shwina commented Jul 12, 2021

cc: @vyasr (just FYI), as I think we disabled these aggs on struct columns on one of your groupby cleanup PRs

@vyasr
Copy link
Contributor

vyasr commented Jul 14, 2021

Perhaps you're thinking of these changes on #7731? Prior to this, there were a number of dtypes for which unsupported aggregations weren't being checked.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

5 participants