-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor joins for conditional semis and antis #14646
Refactor joins for conditional semis and antis #14646
Conversation
…cate a second output when it is not needed in that context
Did some benchmarking with CPU: 12th Gen Intel(R) Core(TM) i9-12900K, 3200 Mhz, 16 Core(s), 24 Logical Processor(s) |
/ok to test |
@vyasr would you please take a look when you get back? |
Please note that this PR addresses part of #10039 |
/ok to test |
@DanialJavady96 Making this ready for review to draw proper attention from reviewers |
Responding to myself -- I think our testing looks okay for now. I don't know of anything that would need to be changed. https://github.com/rapidsai/cudf/blob/branch-24.06/cpp/tests/join/conditional_join_tests.cu |
Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Yunsong Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to use device_async_resource_ref for MR parameters. Otherwise just a couple of nits.
cpp/src/join/conditional_join.cu
Outdated
@@ -348,14 +443,13 @@ std::unique_ptr<rmm::device_uvector<size_type>> conditional_left_semi_join( | |||
rmm::mr::device_memory_resource* mr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☑️ todo: Please use rmm::device_async_resource_ref
(not pointer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #15498 for details / examples. It should be a straightforward replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanialJavady96 I can launch another round of CI tests once this and the below (line 366) mr
is migrated to the new async resource ref.
Co-authored-by: Mark Harris <[email protected]>
Co-authored-by: Mark Harris <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, and one question about exiting early / work reduction.
cpp/src/join/conditional_join.cu
Outdated
auto const right_num_rows{right.num_rows()}; | ||
auto const left_num_rows{left.num_rows()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's choose a consistent pattern between this and the changes below. Either use auto const
variables or always use .num_rows()
. It looks like the below was refactored to use the method, but this is still using variables.
if (join_type == join_kind::LEFT_SEMI_JOIN && !found_match) { | ||
add_left_to_cache(outer_row_index, current_idx_shared, warp_id, join_shared_l[warp_id]); | ||
} | ||
found_match = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once found_match
is true for an outer_row_index
, are we allowed to quit evaluating more inner rows for matches? It seems like we should be able to trigger the flush code and then go to the next outer_row_index
. Both SEMI and ANTI joins check that found_match
is false before adding an outer_row_index
, but it seems like they would continue evaluating other inner rows anyway (but they shouldn't have to do so). Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch! Yeah there is no need at all for it to continue searching that space after it finds a match. I am really curious about the speed improvements this change will make. Let me benchmark before i push up.
Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
Compared to the benchmarks here, Looks pretty good! Some of the gains are quite significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last styling issue. Otherwise LGTM
auto left_num_rows{left.num_rows()}; | ||
if (right_num_rows == 0) { | ||
|
||
if (right.num_rows() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place that @bdice was referring to https://github.com/rapidsai/cudf/pull/14646/files#r1573011742.
Depending on how you see it, we should consistently use either auto const ...
or .num_rows()
in both functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the entire file to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... Okay. It turns out i was pushing to the wrong fork. That's why there were was some inconsistencies. My bad.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Great work!
/merge |
Contributes to #10039
Currently
conditional_joins
for both semi and anti joins rely on an implementation that was designed for taking in results from both tables involved in the join. This leads to wasteful allocation that can be optimized for these two cases.Description
Add a new kernel to be used for both semi and anti joins.
Add some new device functions for adding only one array of shared_memory for caching.
Tests pass on my 3080.
Checklist