-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRUCT column support for cudf::merge. #8422
STRUCT column support for cudf::merge. #8422
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update copyright
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8422 +/- ##
===============================================
Coverage ? 82.91%
===============================================
Files ? 110
Lines ? 18094
Branches ? 0
===============================================
Hits ? 15002
Misses ? 3092
Partials ? 0 Continue to review full report at Codecov.
|
rmm::device_buffer validity = | ||
lcol.has_nulls() || rcol.has_nulls() | ||
? create_null_mask(merged_size, mask_state::UNINITIALIZED, stream, mr) | ||
: rmm::device_buffer{}; | ||
if (lcol.has_nulls() || rcol.has_nulls()) { | ||
materialize_bitmask(lcol, | ||
rcol, | ||
static_cast<bitmask_type*>(validity.data()), | ||
merged_size, | ||
row_order_.data(), | ||
stream); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use is_nullable
here and let the caller be responsible for introspecting the data. If either of the inputs are nullable, the output should be nullable, regardless of whether there any invalid elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely not consistent with how cudf works though. We try and drop validity whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a mixed bag whether we do or not, but I think there's a strong argument for letting the user decide. Everywhere else we strongly prefer to avoid data introspection, and null-counting is data introspection. Switching to is_nullable
means we are no longer introspecting the validity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad Pinging Jake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving because there's no point in blocking on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to start a general discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the meta here is that we should be expecting null count to be already computed more often than not, so at least on the performance side it's theoretically free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one small recommended change.
@gpucibot merge |
Small cleanup post #8422 Authors: - Conor Hoekstra (https://github.com/codereport) Approvers: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Mark Harris (https://github.com/harrism) URL: #8534
Partially addresses #8050
Adds support for merging of struct columns. The struct columns cannot be used as keys in the merge.