-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "unflatten" of columns flattened via flatten_nested_columns()
:
#8956
Support "unflatten" of columns flattened via flatten_nested_columns()
:
#8956
Conversation
`cudf::flatten_nested_columns()` flattens out STRUCT columns into their constituent member columns, and includes the STRUCT's validity information as a BOOL8 column. E.g. STRUCT_1< STRUCT_2< A, B >, C > is flattened to: 1. Null Vector for STRUCT_1 2. Null Vector for STRUCT_2 3. Member STRUCT_2::A 4. Member STRUCT_2::B 5. Member STRUCT_1::C This commit adds an `unflatten_nested_columns()` method to convert back from a flattened representation to the nested columns.
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #8956 +/- ##
===============================================
Coverage ? 10.73%
===============================================
Files ? 114
Lines ? 18666
Branches ? 0
===============================================
Hits ? 2003
Misses ? 16663
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments. Overall, I would like to recommend more polishing in style/formatting. Will get back again later to see if there's anything more to suggest :)
1. Minor reformatting. 2. num_rows init change 3. Removed TODO for blueprint null handling.
Took @jrhemstad off the hook. Added @devavret, on whom I'll also be relying for the related |
rerun tests |
The failure does not seem to be related to this change. From the failure logs:
I'm not sure why this is happening. |
The CI error seems to be fixed. |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. Minor formatting. 2. Reuse `is_struct()` in a couple of places. 3. Modified test numerical values.
Thanks for the reviews, chaps. I'll check this in after CI tests pass. |
@gpucibot merge |
This commit adds support for `STRUCT` columns in `groupby`. This should now allow for `groupby` aggregations to work when any of the grouping columns are `STRUCT`, including nested `STRUCTS`. Note: `List` columns are still not supported on `groupby`, even as members of `STRUCT` columns, at any level of nesting. Only `STRUCT`, `STRUCT<STRUCT>`, etc. are currently supported. Depends on rapidsai#8956 (i.e. `unflatten_nested_columns()`).
This commit adds support for `STRUCT` columns in `groupby`. This should now allow for groupby aggregations to work when any of the grouping columns are `STRUCT`, including nested `STRUCTS`. Note: List columns are still not supported on `groupby`, even as members of `STRUCT` columns, at any level of nesting. Only `STRUCT`, `STRUCT<STRUCT>`, etc. are currently supported. Depends on #8956 (i.e. `unflatten_nested_columns()`). Authors: - MithunR (https://github.com/mythrocks) Approvers: - Jake Hemstad (https://github.com/jrhemstad) URL: #9024
cudf::flatten_nested_columns()
flattens outSTRUCT
columns into theirconstituent member columns, and includes the
STRUCT
's validity informationas a
BOOL8
column.E.g.
STRUCT_1< STRUCT_2< A, B >, C >
is flattened to:1. Null Vector for
STRUCT_1
2. Null Vector for
STRUCT_2
3. Member
STRUCT_2::A
4. Member
STRUCT_2::B
5. Member
STRUCT_1::C
This commit adds an
unflatten_nested_columns()
method to convert backfrom a flattened representation to the nested columns.