Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily reverse semi-anti-join implementation #11310

Closed
wants to merge 1 commit into from

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jul 20, 2022

The new implementation in semi-join and anti-join uses cuco::static_multimap that has performance issue with input tables having too many duplicate rows. This was not anticipated when refactoring semi-anti-join in #11100. Completely fixing this could involve more work from cuco and review time. However, this needs to be worked around ASAP to unblock spark-rapids's daily performance benchmark.

This PR temporarily reverses the implementation of semi-anti-join to its old state while waiting for a complete fix to come up later (tracked by the issue #11313).

Closes #11299.

@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jul 20, 2022
@ttnghia ttnghia requested a review from a team as a code owner July 20, 2022 16:17
@ttnghia ttnghia requested a review from hyperbolic2346 July 20, 2022 16:17
@ttnghia ttnghia self-assigned this Jul 20, 2022
@ttnghia ttnghia requested review from jrhemstad and PointKernel July 20, 2022 16:17
@ttnghia ttnghia requested a review from abellina July 20, 2022 16:17
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 20, 2022

Rerun tests.

@abellina
Copy link
Contributor

@ttnghia this workaround works and now semi joins are not hanging anymore.

@abellina
Copy link
Contributor

Build failing with:

conda.CondaMultiError: HTTP 403 FORBIDDEN for url <https://conda.anaconda.org/nvidia/linux-64/cudatoolkit-11.5.1-hcf5317a_9.tar.bz2>

@razajafri razajafri changed the title Temprarily reverse semi-anti-join implementation Temporarily reverse semi-anti-join implementation Jul 20, 2022
@razajafri
Copy link
Contributor

Why not just revert #11100?

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 20, 2022

Why not just revert #11100?

Reply again for other reviewers: #11100 has other changes not just for semi-anti-join. Reversing it entirely will break other code.

// `map.contains` inside the `thrust::copy_if` kernel. However, that led to increasing register
// usage and reducing performance, as reported here: https://github.com/rapidsai/cudf/pull/10511.
auto const flagged =
cudf::detail::contains(right_keys, left_keys, compare_nulls, nan_equality::ALL_EQUAL, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this issue affect other uses of cudf::detail::contains? Should we be changing that function's implementation instead of just the semi-join implementation? (I haven't formed an opinion on this question yet, need to read more code first.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right question. Changing the implementation will completely fix this, but requires new FEA from cuco, which is under way: NVIDIA/cuCollections#191

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerns me as well. It seems an issue with lots of duplicate keys in general and we just found this instance to be a problem so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the only use case of cudf::detail::contains is in lists operations.

Copy link
Contributor

@bdice bdice Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of #11100 was to reduce the number of unique functions implementing the same (or similar) logic. Is it possible to change cudf::detail::contains and leave semi_join.cu untouched?

Copy link
Contributor Author

@ttnghia ttnghia Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound reasonable. But I'll address that concern in another separate PR. I would like to keep cudf::detail::contains separated from semi-anti-join for 22.08 to prevent any last-minute surprising performance issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ask Jake for a quick review since the change is basically the same as NVIDIA/cuCollections#175. The issue must affect other use cases of detail::contains but just not unveiled by the existing benchmarks yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. If we have #cuco/191 merged quickly then I can have a complete fix up for detail::contains without a temp fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it now, should be ready very quickly.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate include, but otherwise looks ok. I understand this is a revert of a portion of a PR, so the code isn't new.

cpp/src/join/semi_join.cu Show resolved Hide resolved
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 20, 2022

Rerun tests.

1 similar comment
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 20, 2022

Rerun tests.

@PointKernel
Copy link
Member

Based on the offline discussion, a temporary workaround to fix the issue could be creating a single map of row hash values and row indices and then using cuco::static_map::pair_insert and cuco::static_map::pair_contains to insert/find distinct rows. This solution requires an ambiguous pair_insert API from cuco::static_map and the proper way to clean up the whole cudf::detail::contains logic is to use a set.

For this PR, we will revert back to the implementation of a single map of row indices with idle payloads.

There will be a future cleanup once cuco::static_set is ready.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2022

Rerun tests.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2022

Here is more clarification regarding the concern above, in case you guys are still unclear:

  • After Refactor semi_anti_join #11100 merged, semi-anti-joins became dependent on cudf::detail::contains, which has a totally new implementation from the old implementation of semi-anti-joins, not just copying the code over there. Such new implementation was tested for the first time in Spark and was immediately reported as having performance issue in [BUG] performance regression after semi_anti_join refactor #11299.
  • cudf::detail::contains now is used not just in semi-anti-joins but also in set operations. It supports nested types while the old semi-anti-joins implementation doesn't.
  • If I temporarily put the old implementation of *-joins into cudf::detail::contains to fix the performance issue, set operations will crash with nested types. A more comprehensive fix for the performance issue in cudf::detail::contains will require more modification to the new implementation.
  • cudf burn down for 22.08 is about to start, it is a bit risky to fix the new implementation in cudf::detail::contains and call it from *-joins. If I do that and there is another new (bad) surprise in performance, we will be very hurried to fix it for 22.08 release.

Thus, it is safer to separate *-joins from cudf::detail::contains for 22.08 release:

  • *-joins will continue to use their old implementation to avoid new surprises.
  • Addressing performance issue for cudf::detail::contains will be up shortly in a separate PR (also for 22.08 release).
  • *-joins will switch to use cudf::detail::contains again after 22.08 has been released. This will allow us to have enough time to test.

@codecov

This comment was marked as off-topic.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2022

Another reason why I want to defer calling cudf::detail::contains in semi-anti-join is that, from the data given from Spark benchmark (used in #11299), the old implementation runs like:

insert: 1.158599ms
contains: 0.456737ms

while with #11325 (fixing the issue by changing cudf::detail::contains) we have it runs like:

insert: 1.829131ms
contains: 1.859050ms

Yes, the initial benchmark in #11100 was showing that the new implementation gains over 10% performance improvement but this time it is worse (because the new implementation is modified to a "newer" implementation to fix the performance issue, but becomes slower). I suspect that the performance regression here is due to the same reason as seen in #10811, when the input is a structs column having many children.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2022

Rerun tests.

@PointKernel
Copy link
Member

PointKernel commented Jul 21, 2022

The initial goal of #11100 was to extract the contains logic out of semi-join thus it can be used widely across libcudf and minimize redundancy. By partially reverting semi-join for a quick fix, not only do we introduce a follow-up work to get rid of this temporary workaround, but also we are slightly off the initial goal. The 1x slow down in some rare use cases is not too significant to accept. The performance regression definitely needs a comprehensive investigation and we also want to avoid overfitting a too-specific problem when choosing the implementation. As for the scope of this PR, I'm inclined to continue to improve contains and only revert semi-join back to the old implementation when we have to (e.g. things still not working when code freeze comes).

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 21, 2022

FYI: I pushed an alternative fix (#11325) for the performance issue. It is under testing with Spark-Rapids.

That PR should address the concern you guys mentioned above.

@ttnghia ttnghia added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jul 22, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 22, 2022

Close this as it is covered in a new PR: #11330.

@ttnghia ttnghia closed this Jul 22, 2022
rapids-bot bot pushed a commit that referenced this pull request Jul 22, 2022
…ins` (#11330)

The current implementation of `cudf::detail::contains` can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created. 

Unfortunately, #11310 is separating semi-anti-join from `cudf::detail::contains`, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation.

The changes in this PR include the following:
 * Fix the performance issue reported in #11299 for the current `cudf::detail::contains` implementation that support nested types.
 * Add a separate code path into `cudf::detail::contains` such that:
     * Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before.
     * Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped.

Closes #11299.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Bradley Dice (https://github.com/bdice)
  - MithunR (https://github.com/mythrocks)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Alessandro Bellina (https://github.com/abellina)

URL: #11330
@ttnghia ttnghia deleted the reverses_semi_join branch July 22, 2022 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 5 - DO NOT MERGE Hold off on merging; see PR for details bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] performance regression after semi_anti_join refactor
6 participants