-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins #9666
Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins #9666
Conversation
BenchmarksOld code:
New code:
|
cpp/src/join/semi_join.cu
Outdated
// Note: This equality comparator violates symmetry of equality and is | ||
// therefore relying on the implementation detail of the order in which its | ||
// operator is invoked. If cuco makes no promises about the order of | ||
// invocation this seems a bit unsafe. | ||
row_equality equality_probe{*right_rows_d, *left_rows_d, compare_nulls == null_equality::EQUAL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you're going to try and do the same optimization as is done in the other joins of using the row hash value as the key and the index as the payload, you're going to need to add the equivalent of pair_contains
from the multimap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our discussions I think we can wait on this as well. The same issue exists in other join types and probably isn't worth addressing until NVIDIA/cuCollections#110.
I intended this to be a draft PR, just marked it as such. I pushed this out so that I could get some feedback and we could iron out the gaps with cuco, but this isn't really ready for the big time yet. We'll probably want to wait on further improvements in cuco. |
This PR requires NVIDIA/cuCollections#118, but once that's merged I think we can move forward with this largely as is. While there are significant improvements that could be made, they are heavily dependent on refactoring cuCollections and I don't think we benefit too much by trying to implement interim stopgap solutions. |
This PR depends on NVIDIA/cuCollections#113, otherwise the default hash allocator won't work here. |
…ly for non-nullable types.
…nullable cases and fix a hashing bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@vyasr Back to the benchmark results, any idea why the new implementation is slower? |
I'm reasonably confident that the performance regression is entirely due to the switch from cudf's concurrent unordered map to the cuco static map, which hasn't benefited from the optimizations you worked on for the multimap. @jrhemstad was fine eating the perf hit for now and postponing optimization because we were trying to get the mixed joins in #9917 up and running ASAP. However, the work in #9917 shows that the new mixed join code is going to have to be a new kernel rather than a direct adaptation of the existing hash join code because of how we deal with shared memory. Therefore, IMO this PR is no longer a prerequisite for getting mixed joins done for semi/anti joins and that work can happen in parallel, i.e. we could start using cuco's static multimap for mixed joins without merging this PR. @jrhemstad in light of that, do you want to hold off on merging this PR until we've had a chance to do the cuco refactoring and optimized |
rerun tests |
Hmm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rerun tests |
Discussed offline, we're going to get this merged now and deal with perf later. @gpucibot merge |
Is there an issue to track this? |
Just made one in #9973. |
@gpucibot merge |
The `concurrent_unordered_multimap` is no longer used in libcudf. It has been replaced by `cuco::static_multimap`. The majority of the refactoring was done in PRs #8934 and #9704. A similar effort is in progress for `concurrent_unordered_map` and `cuco::static_map` in #9666 (and may depend on porting some optimizations from libcudf to cuco -- need to look into this before doing a direct replacement). This partially resolves issue #10401. cc: @PointKernel @vyasr Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vyas Ramasubramani (https://github.com/vyasr) - Jake Hemstad (https://github.com/jrhemstad) URL: #10642
This PR resolves #9586, replacing the hash table used in semi and anti joins with cuco::static_map. It depends on NVIDIA/cuCollections#118. At present the code is slower than the original version, so we'll probably want to make some optimizations in cuco before merging this.