-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support collect aggregations in reduction #10353
Support collect aggregations in reduction #10353
Conversation
Signed-off-by: sperlingxx <[email protected]>
cpp/src/reductions/collect_ops.cu
Outdated
auto not_null_pred = [mask = col.null_mask(), offset = col.offset()] __device__(auto i) { | ||
return bit_is_set(mask, offset + i); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use make_validity_iterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
cpp/src/reductions/collect_ops.cu
Outdated
column* null_purged_col = null_purged_table->release().front().release(); | ||
null_purged_col->set_null_mask(rmm::device_buffer{0, stream, mr}, 0); | ||
return std::make_unique<list_scalar>(*null_purged_col, true, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a copy of null_purged_col
and then leaks it.
column* null_purged_col = null_purged_table->release().front().release(); | |
null_purged_col->set_null_mask(rmm::device_buffer{0, stream, mr}, 0); | |
return std::make_unique<list_scalar>(*null_purged_col, true, stream, mr); | |
auto null_purged_col = null_purged_table->release().front(); | |
null_purged_col->set_null_mask(rmm::device_buffer{0, stream, mr}, 0); | |
return std::make_unique<list_scalar>(std::move(*null_purged_col), true, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad Change done. Thank you for the catch!
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10353 +/- ##
================================================
- Coverage 10.62% 10.50% -0.12%
================================================
Files 122 127 +5
Lines 20961 21200 +239
================================================
Hits 2228 2228
- Misses 18733 18972 +239
Continue to review full report at Codecov.
|
Hi @jrhemstad, could you help to take another look at this PR? Thank you |
cpp/src/reductions/collect_ops.cu
Outdated
std::unique_ptr<scalar> merge_sets(column_view const& col, | ||
null_equality nulls_equal, | ||
nan_equality nans_equal, | ||
rmm::cuda_stream_view stream, | ||
rmm::mr::device_memory_resource* mr) | ||
{ | ||
CUDF_EXPECTS(col.type().id() == type_id::LIST, | ||
"input column of merge_lists must be a list column"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more descriptive to just make the parameter a lists_column_view
and eliminates the extra error checking.
std::unique_ptr<scalar> merge_sets(column_view const& col, | |
null_equality nulls_equal, | |
nan_equality nans_equal, | |
rmm::cuda_stream_view stream, | |
rmm::mr::device_memory_resource* mr) | |
{ | |
CUDF_EXPECTS(col.type().id() == type_id::LIST, | |
"input column of merge_lists must be a list column"); | |
std::unique_ptr<scalar> merge_sets(lists_column_view const& col, | |
null_equality nulls_equal, | |
nan_equality nans_equal, | |
rmm::cuda_stream_view stream, | |
rmm::mr::device_memory_resource* mr) | |
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: sperlingxx <[email protected]>
@gpucibot merge |
Closes #7807
Curreent PR is to support the collect aggregation family in reduction context, which includes collect_list, collect_set, merge_lists, and merge_sets.
The implementations are inspired by corresponding collect aggregations in groupby context.