Temporarily reverse semi-anti-join implementation #11310

ttnghia · 2022-07-20T16:17:25Z

The new implementation in semi-join and anti-join uses cuco::static_multimap that has performance issue with input tables having too many duplicate rows. This was not anticipated when refactoring semi-anti-join in #11100. Completely fixing this could involve more work from cuco and review time. However, this needs to be worked around ASAP to unblock spark-rapids's daily performance benchmark.

This PR temporarily reverses the implementation of semi-anti-join to its old state while waiting for a complete fix to come up later (tracked by the issue #11313).

Closes #11299.

ttnghia · 2022-07-20T17:08:48Z

Rerun tests.

abellina · 2022-07-20T17:32:28Z

@ttnghia this workaround works and now semi joins are not hanging anymore.

abellina · 2022-07-20T17:41:52Z

Build failing with:

conda.CondaMultiError: HTTP 403 FORBIDDEN for url <https://conda.anaconda.org/nvidia/linux-64/cudatoolkit-11.5.1-hcf5317a_9.tar.bz2>

razajafri · 2022-07-20T18:21:50Z

Why not just revert #11100?

ttnghia · 2022-07-20T18:26:48Z

Why not just revert #11100?

Reply again for other reviewers: #11100 has other changes not just for semi-anti-join. Reversing it entirely will break other code.

bdice · 2022-07-20T18:59:38Z

cpp/src/join/semi_join.cu

-  // `map.contains` inside the `thrust::copy_if` kernel. However, that led to increasing register
-  // usage and reducing performance, as reported here: https://github.com/rapidsai/cudf/pull/10511.
-  auto const flagged =
-    cudf::detail::contains(right_keys, left_keys, compare_nulls, nan_equality::ALL_EQUAL, stream);


Does this issue affect other uses of cudf::detail::contains? Should we be changing that function's implementation instead of just the semi-join implementation? (I haven't formed an opinion on this question yet, need to read more code first.)

Right question. Changing the implementation will completely fix this, but requires new FEA from cuco, which is under way: NVIDIA/cuCollections#191

This concerns me as well. It seems an issue with lots of duplicate keys in general and we just found this instance to be a problem so far.

Currently, the only use case of cudf::detail::contains is in lists operations.

One of the goals of #11100 was to reduce the number of unique functions implementing the same (or similar) logic. Is it possible to change cudf::detail::contains and leave semi_join.cu untouched?

Sound reasonable. But I'll address that concern in another separate PR. I would like to keep cudf::detail::contains separated from semi-anti-join for 22.08 to prevent any last-minute surprising performance issue.

We can ask Jake for a quick review since the change is basically the same as NVIDIA/cuCollections#175. The issue must affect other use cases of detail::contains but just not unveiled by the existing benchmarks yet.

That's great. If we have #cuco/191 merged quickly then I can have a complete fix up for detail::contains without a temp fix.

Working on it now, should be ready very quickly.

hyperbolic2346

Duplicate include, but otherwise looks ok. I understand this is a revert of a portion of a PR, so the code isn't new.

cpp/src/join/semi_join.cu

ttnghia · 2022-07-20T20:37:39Z

Rerun tests.

ttnghia · 2022-07-20T23:33:07Z

Rerun tests.

PointKernel · 2022-07-20T23:52:00Z

Based on the offline discussion, a temporary workaround to fix the issue could be creating a single map of row hash values and row indices and then using cuco::static_map::pair_insert and cuco::static_map::pair_contains to insert/find distinct rows. This solution requires an ambiguous pair_insert API from cuco::static_map and the proper way to clean up the whole cudf::detail::contains logic is to use a set.

For this PR, we will revert back to the implementation of a single map of row indices with idle payloads.

There will be a future cleanup once cuco::static_set is ready.

ttnghia · 2022-07-21T01:18:29Z

Rerun tests.

ttnghia · 2022-07-21T02:36:50Z

Here is more clarification regarding the concern above, in case you guys are still unclear:

After Refactor semi_anti_join #11100 merged, semi-anti-joins became dependent on cudf::detail::contains, which has a totally new implementation from the old implementation of semi-anti-joins, not just copying the code over there. Such new implementation was tested for the first time in Spark and was immediately reported as having performance issue in [BUG] performance regression after semi_anti_join refactor #11299.
cudf::detail::contains now is used not just in semi-anti-joins but also in set operations. It supports nested types while the old semi-anti-joins implementation doesn't.
If I temporarily put the old implementation of *-joins into cudf::detail::contains to fix the performance issue, set operations will crash with nested types. A more comprehensive fix for the performance issue in cudf::detail::contains will require more modification to the new implementation.
cudf burn down for 22.08 is about to start, it is a bit risky to fix the new implementation in cudf::detail::contains and call it from *-joins. If I do that and there is another new (bad) surprise in performance, we will be very hurried to fix it for 22.08 release.

Thus, it is safer to separate *-joins from cudf::detail::contains for 22.08 release:

*-joins will continue to use their old implementation to avoid new surprises.
Addressing performance issue for cudf::detail::contains will be up shortly in a separate PR (also for 22.08 release).
*-joins will switch to use cudf::detail::contains again after 22.08 has been released. This will allow us to have enough time to test.

ttnghia · 2022-07-21T04:09:22Z

Another reason why I want to defer calling cudf::detail::contains in semi-anti-join is that, from the data given from Spark benchmark (used in #11299), the old implementation runs like:

insert: 1.158599ms
contains: 0.456737ms

while with #11325 (fixing the issue by changing cudf::detail::contains) we have it runs like:

insert: 1.829131ms
contains: 1.859050ms

Yes, the initial benchmark in #11100 was showing that the new implementation gains over 10% performance improvement but this time it is worse (because the new implementation is modified to a "newer" implementation to fix the performance issue, but becomes slower). I suspect that the performance regression here is due to the same reason as seen in #10811, when the input is a structs column having many children.

ttnghia · 2022-07-21T13:25:31Z

Rerun tests.

PointKernel · 2022-07-21T14:35:51Z

The initial goal of #11100 was to extract the contains logic out of semi-join thus it can be used widely across libcudf and minimize redundancy. By partially reverting semi-join for a quick fix, not only do we introduce a follow-up work to get rid of this temporary workaround, but also we are slightly off the initial goal. The 1x slow down in some rare use cases is not too significant to accept. The performance regression definitely needs a comprehensive investigation and we also want to avoid overfitting a too-specific problem when choosing the implementation. As for the scope of this PR, I'm inclined to continue to improve contains and only revert semi-join back to the old implementation when we have to (e.g. things still not working when code freeze comes).

ttnghia · 2022-07-21T21:53:54Z

FYI: I pushed an alternative fix (#11325) for the performance issue. It is under testing with Spark-Rapids.

That PR should address the concern you guys mentioned above.

ttnghia · 2022-07-22T18:33:40Z

Close this as it is covered in a new PR: #11330.

…ins` (#11330) The current implementation of `cudf::detail::contains` can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created. Unfortunately, #11310 is separating semi-anti-join from `cudf::detail::contains`, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation. The changes in this PR include the following: * Fix the performance issue reported in #11299 for the current `cudf::detail::contains` implementation that support nested types. * Add a separate code path into `cudf::detail::contains` such that: * Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before. * Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped. Closes #11299. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Alessandro Bellina (https://github.com/abellina) URL: #11330

Reverse semi-join to its old implementation

0c89deb

ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jul 20, 2022

ttnghia requested a review from a team as a code owner July 20, 2022 16:17

ttnghia requested a review from hyperbolic2346 July 20, 2022 16:17

ttnghia self-assigned this Jul 20, 2022

ttnghia requested review from jrhemstad and PointKernel July 20, 2022 16:17

ttnghia requested a review from abellina July 20, 2022 16:17

abellina approved these changes Jul 20, 2022

View reviewed changes

razajafri changed the title ~~Temprarily reverse semi-anti-join implementation~~ Temporarily reverse semi-anti-join implementation Jul 20, 2022

bdice reviewed Jul 20, 2022

View reviewed changes

hyperbolic2346 requested changes Jul 20, 2022

View reviewed changes

cpp/src/join/semi_join.cu Show resolved Hide resolved

This comment was marked as off-topic.

Sign in to view

ttnghia mentioned this pull request Jul 22, 2022

Fix performance issue and add a new code path to cudf::detail::contains #11330

Merged

ttnghia added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jul 22, 2022

ttnghia closed this Jul 22, 2022

ttnghia deleted the reverses_semi_join branch July 22, 2022 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporarily reverse semi-anti-join implementation #11310

Temporarily reverse semi-anti-join implementation #11310

ttnghia commented Jul 20, 2022 •

edited

Loading

ttnghia commented Jul 20, 2022

abellina commented Jul 20, 2022

abellina commented Jul 20, 2022

razajafri commented Jul 20, 2022

ttnghia commented Jul 20, 2022

bdice Jul 20, 2022

ttnghia Jul 20, 2022

hyperbolic2346 Jul 20, 2022

ttnghia Jul 20, 2022

bdice Jul 20, 2022 •

edited

Loading

ttnghia Jul 20, 2022 •

edited

Loading

PointKernel Jul 20, 2022

ttnghia Jul 20, 2022

PointKernel Jul 20, 2022

hyperbolic2346 left a comment

ttnghia commented Jul 20, 2022

ttnghia commented Jul 20, 2022

PointKernel commented Jul 20, 2022

ttnghia commented Jul 21, 2022

ttnghia commented Jul 21, 2022 •

edited

Loading

This comment was marked as off-topic.

ttnghia commented Jul 21, 2022 •

edited

Loading

ttnghia commented Jul 21, 2022

PointKernel commented Jul 21, 2022 •

edited

Loading

ttnghia commented Jul 21, 2022 •

edited

Loading

ttnghia commented Jul 22, 2022

Temporarily reverse semi-anti-join implementation #11310

Temporarily reverse semi-anti-join implementation #11310

Conversation

ttnghia commented Jul 20, 2022 • edited Loading

ttnghia commented Jul 20, 2022

abellina commented Jul 20, 2022

abellina commented Jul 20, 2022

razajafri commented Jul 20, 2022

ttnghia commented Jul 20, 2022

bdice Jul 20, 2022

Choose a reason for hiding this comment

ttnghia Jul 20, 2022

Choose a reason for hiding this comment

hyperbolic2346 Jul 20, 2022

Choose a reason for hiding this comment

ttnghia Jul 20, 2022

Choose a reason for hiding this comment

bdice Jul 20, 2022 • edited Loading

Choose a reason for hiding this comment

ttnghia Jul 20, 2022 • edited Loading

Choose a reason for hiding this comment

PointKernel Jul 20, 2022

Choose a reason for hiding this comment

ttnghia Jul 20, 2022

Choose a reason for hiding this comment

PointKernel Jul 20, 2022

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

ttnghia commented Jul 20, 2022

ttnghia commented Jul 20, 2022

PointKernel commented Jul 20, 2022

ttnghia commented Jul 21, 2022

ttnghia commented Jul 21, 2022 • edited Loading

This comment was marked as off-topic.

ttnghia commented Jul 21, 2022 • edited Loading

ttnghia commented Jul 21, 2022

PointKernel commented Jul 21, 2022 • edited Loading

ttnghia commented Jul 21, 2022 • edited Loading

ttnghia commented Jul 22, 2022

ttnghia commented Jul 20, 2022 •

edited

Loading

bdice Jul 20, 2022 •

edited

Loading

ttnghia Jul 20, 2022 •

edited

Loading

ttnghia commented Jul 21, 2022 •

edited

Loading

ttnghia commented Jul 21, 2022 •

edited

Loading

PointKernel commented Jul 21, 2022 •

edited

Loading

ttnghia commented Jul 21, 2022 •

edited

Loading