-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmented apply_boolean_mask
for LIST
columns
#10773
Segmented apply_boolean_mask
for LIST
columns
#10773
Conversation
This is still a work in progress. Some work remains to be done:
|
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10773 +/- ##
================================================
+ Coverage 86.37% 86.43% +0.06%
================================================
Files 142 143 +1
Lines 22306 22448 +142
================================================
+ Hits 19266 19403 +137
- Misses 3040 3045 +5
Continue to review full report at Codecov.
|
I have removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just some minor suggestions.
cpp/src/lists/apply_boolean_mask.cu
Outdated
|
||
auto constexpr offset_data_type = data_type{type_id::INT32}; | ||
|
||
auto filtered_child = [&] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the lambdas in this function are for. Are you trying to exploit RVO from them to avoid needing additional std::move
calls? I'm not sure why this code has this extra level of indirection from apply_boolean_mask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not RVO, but close. The alternative would have been:
auto output_offsets = [&] { ... } ();
auto filtered_child = [&] { ... } ();
// ...
return cudf::make_lists_column(input.size(),
std::move(output_offsets),
std::move(filtered_child),
input.null_count(),
cudf::detail::copy_bitmask(input.parent(), stream, mr),
stream,
mr);
This would have been what I have already, with more steps. By not immediately invoking the IILE, one avoids having to create-then-std-move those expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I should make those lambdas const
.
Edit: These are now const. Please let me know if you'd prefer we use the lambda as an IILE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you perhaps missed the point of my question. I'm not asking why you didn't immediately invoke the lambdas. I'm asking why you defined them as lambdas at all. Perhaps that's a silly question for some obvious reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Sorry I didn't follow earlier.
It's only for the "packaging", like helper functions. For instance, the temporaries in the construction of offsets
aren't really relevant to the rest of the function. I'm hoping to avoid clutter in the rest of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. To be honest I'm not convinced that restricting the scopes here really helps all that much relative to the boilerplate that it adds (the lambda declarations, the returns, releasing a unique pointer to control scope), not to mention additional cognitive overhead in reading it to figure out scopes, but I don't mind it much so I'm fine leaving it if you like it that way.
1. Better names for some test cases. 2. Cached common expressions across lambdas. 3. Removed unnecessary @copydoc. 4. Switch references to `selector` to instead use `boolean_mask`. 5. Miscellaneous minor improvements.
@ttnghia, @vyasr, @jrhemstad: Thank you for your reviews. @vyasr: I think I've addressed most of your concerns, save for two:
I'd be happy to change both, if you'd prefer. |
@mythrocks I'm good with the comments. I have a follow-up on the lambdas that we can discuss tomorrow if you'd like, then I think we should be set here. |
@gpucibot merge |
Thanks for the reviews, all. |
Contributes to #10650 Add JNI support for `apply_boolean_mask` Refer to the descriptions of PR #10773 Signed-off-by: Chong Gao <[email protected]> Authors: - Chong Gao (https://github.com/res-life) Approvers: - Liangcai Li (https://github.com/firestarman) - Jason Lowe (https://github.com/jlowe) URL: #10812
Fixes #10650.
This commit introduces an
apply_boolean_mask()
method that interprets a booleanLIST
column as a filter, to select elements from an arbitraryLIST
input column.E.g.
The
input
and theboolean_mask
should both have the same number of rows, and each row should have the same number of elements.Each output row copies the elements from the input where the boolean mask is non-null and true.