-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporarily reverse semi-anti-join implementation #11310
Conversation
Rerun tests. |
@ttnghia this workaround works and now semi joins are not hanging anymore. |
Build failing with:
|
Why not just revert #11100? |
// `map.contains` inside the `thrust::copy_if` kernel. However, that led to increasing register | ||
// usage and reducing performance, as reported here: https://github.com/rapidsai/cudf/pull/10511. | ||
auto const flagged = | ||
cudf::detail::contains(right_keys, left_keys, compare_nulls, nan_equality::ALL_EQUAL, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this issue affect other uses of cudf::detail::contains
? Should we be changing that function's implementation instead of just the semi-join implementation? (I haven't formed an opinion on this question yet, need to read more code first.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right question. Changing the implementation will completely fix this, but requires new FEA from cuco, which is under way: NVIDIA/cuCollections#191
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concerns me as well. It seems an issue with lots of duplicate keys in general and we just found this instance to be a problem so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the only use case of cudf::detail::contains
is in lists operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the goals of #11100 was to reduce the number of unique functions implementing the same (or similar) logic. Is it possible to change cudf::detail::contains
and leave semi_join.cu
untouched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound reasonable. But I'll address that concern in another separate PR. I would like to keep cudf::detail::contains
separated from semi-anti-join for 22.08 to prevent any last-minute surprising performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ask Jake for a quick review since the change is basically the same as NVIDIA/cuCollections#175. The issue must affect other use cases of detail::contains
but just not unveiled by the existing benchmarks yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great. If we have #cuco/191 merged quickly then I can have a complete fix up for detail::contains
without a temp fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on it now, should be ready very quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate include, but otherwise looks ok. I understand this is a revert of a portion of a PR, so the code isn't new.
Rerun tests. |
1 similar comment
Rerun tests. |
Based on the offline discussion, a temporary workaround to fix the issue could be creating a single map of row hash values and row indices and then using For this PR, we will revert back to the implementation of a single map of row indices with idle payloads. There will be a future cleanup once |
Rerun tests. |
Here is more clarification regarding the concern above, in case you guys are still unclear:
Thus, it is safer to separate *-joins from
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Another reason why I want to defer calling
while with #11325 (fixing the issue by changing
Yes, the initial benchmark in #11100 was showing that the new implementation gains over 10% performance improvement but this time it is worse (because the new implementation is modified to a "newer" implementation to fix the performance issue, but becomes slower). I suspect that the performance regression here is due to the same reason as seen in #10811, when the input is a structs column having many children. |
Rerun tests. |
The initial goal of #11100 was to extract the |
FYI: I pushed an alternative fix (#11325) for the performance issue. It is under testing with Spark-Rapids. That PR should address the concern you guys mentioned above. |
Close this as it is covered in a new PR: #11330. |
…ins` (#11330) The current implementation of `cudf::detail::contains` can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created. Unfortunately, #11310 is separating semi-anti-join from `cudf::detail::contains`, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation. The changes in this PR include the following: * Fix the performance issue reported in #11299 for the current `cudf::detail::contains` implementation that support nested types. * Add a separate code path into `cudf::detail::contains` such that: * Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before. * Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped. Closes #11299. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Alessandro Bellina (https://github.com/abellina) URL: #11330
The new implementation in semi-join and anti-join uses
cuco::static_multimap
that has performance issue with input tables having too many duplicate rows. This was not anticipated when refactoring semi-anti-join in #11100. Completely fixing this could involve more work from cuco and review time. However, this needs to be worked around ASAP to unblock spark-rapids's daily performance benchmark.This PR temporarily reverses the implementation of semi-anti-join to its old state while waiting for a complete fix to come up later (tracked by the issue #11313).
Closes #11299.