Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor joins for conditional semis and antis #14646
Refactor joins for conditional semis and antis #14646
Changes from all commits
e9ae4e7
1300bc8
6c1030e
2605529
365e2c6
0411c85
c77ea8c
20f190e
e39c3a0
9bf46a6
dfb5e34
2d1b9ae
4bcb922
6f670f8
000d89d
0c2307f
756bb4c
6dfb7d5
ebe95f5
4267899
ac864f6
a4457f5
26feb6f
cb89def
53f23ab
eeecd8a
24c1c52
91b79a4
6fae49f
7541016
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up PR, we should adopt the same optimization used in #15288. In that PR, we pre-allocate an vector of indices (with size equal to the left table's size), and then shrink to fit the actual number of elements written. That means we only have to execute one kernel rather than two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place that @bdice was referring to https://github.com/rapidsai/cudf/pull/14646/files#r1573011742.
Depending on how you see it, we should consistently use either
auto const ...
or.num_rows()
in both functionsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the entire file to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... Okay. It turns out i was pushing to the wrong fork. That's why there were was some inconsistencies. My bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once
found_match
is true for anouter_row_index
, are we allowed to quit evaluating more inner rows for matches? It seems like we should be able to trigger the flush code and then go to the nextouter_row_index
. Both SEMI and ANTI joins check thatfound_match
is false before adding anouter_row_index
, but it seems like they would continue evaluating other inner rows anyway (but they shouldn't have to do so). Does that sound right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch! Yeah there is no need at all for it to continue searching that space after it finds a match. I am really curious about the speed improvements this change will make. Let me benchmark before i push up.