-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COLLECT_LIST support returning empty output columns. #8279
COLLECT_LIST support returning empty output columns. #8279
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #8279 +/- ##
===============================================
Coverage ? 82.84%
===============================================
Files ? 105
Lines ? 17865
Branches ? 0
===============================================
Hits ? 14800
Misses ? 3065
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'm running in circles through the CI links, trying to figure out what the failure is. It doesn't seem obvious. I'll re-kick it in time for the second code review. |
Handle aggregations that return empty list columns.
Currently, we have this factory function to make an empty column in
I would like to have a similar function for other types as well, putting at the same place with the existing function (
So, for constructing an empty column of nested type, just pass in the additional array of child types. That array can be an array of array of array of... of cudf type. The first function above recursively calls itself. The second is the base case. Having the factory APIs above, we can use it elsewhere in many other places. PS 1: The proposed functions above are not good enough as they cannot handle uneven nested levels. We can use a customized struct instead of PS 2: Maybe this is out of scope of this PR. We can make it a new PR for the next release instead. |
I was about to mention that nesting need not be even, till I read this part. I know there have been prior discussions on conveying nested type information. I think the consensus was to populate child columns all the way down , and use exemplars when constructing empties (a la
Thanks, I concur. :/ |
Reference: #8178 |
@gpucibot merge |
Fixes the group-by portion of #7611.
When
COLLECT_LIST()
orCOLLECT_SET()
aggregations are called on a grouped input, if the input column is empty, then one sees the following failure:The operation should have resulted in an empty
LIST
column.make_empty_column()
does not supportLIST
types (in part because thedata_type
parameter does not capture the types of the child columns).This commit fixes this by constructing the output column from the specified
values
input, but only forCOLLECT_LIST()
andCOLLECT_SET()
; other aggregation types are unchanged.