Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize left_semi_join by materializing the gather mask #10511

Merged
merged 3 commits into from
May 5, 2022

Conversation

cheinger
Copy link
Contributor

@cheinger cheinger commented Mar 24, 2022

Closes #10464

Updates the left_semi_join to materialize the gather mask instead of generating it via a transform iterator.

Including the map.contains in the gather call reduced occupancy due to increasing register usage. As a result, explicitly materializing the gather mask is faster.

@cheinger cheinger requested a review from a team as a code owner March 24, 2022 20:02
@cheinger cheinger requested review from trxcllnt and codereport March 24, 2022 20:02
@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 24, 2022
@PointKernel PointKernel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 29, 2022
Up to 20x faster. Separated hash table lookup from copy_if because
increased register usage significantly limited occupancy of this
kernel.
@jrhemstad
Copy link
Contributor

jrhemstad commented Apr 5, 2022

@cheinger so to be clear, the performance improvement didn't come from using cub::DeviceSelect::Flagged, but instead from pulling the map::contains function out of the copy_if by materializing the predicate as a separate array?

@cheinger
Copy link
Contributor Author

cheinger commented Apr 5, 2022

@jrhemstad correct. I updated the gitlab issue with a more detailed explanation

@jrhemstad
Copy link
Contributor

@cheinger could you update the PR description to provide a short summary? The PR description goes into the CHANGELOG.

@PointKernel
Copy link
Member

ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Can you please update the PR title accordingly? It would be useful to also include your performance analysis (here) in the PR description. Did you notice any performance changes in semi join benchmarks?

cpp/src/join/semi_join.cu Show resolved Hide resolved
cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved
@PointKernel PointKernel added the Performance Performance related issue label Apr 5, 2022
@sevagh
Copy link
Contributor

sevagh commented Apr 5, 2022

ok to test

1 similar comment
@sevagh
Copy link
Contributor

sevagh commented Apr 5, 2022

ok to test

@codecov

This comment was marked as outdated.

@jrhemstad
Copy link
Contributor

@PointKernel can you re-review/approve?

@PointKernel PointKernel changed the title Optimize left_semi_join by using cub::DeviceSelect::Flagged instead of thrust::copy_if Optimize left_semi_join by materializing the gather mask May 5, 2022
@jrhemstad
Copy link
Contributor

add to whitelist

@ajschmidt8
Copy link
Member

add to allowlist

@PointKernel
Copy link
Member

rerun tests

@PointKernel
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ee26fbe into rapidsai:branch-22.06 May 5, 2022
@GregoryKimball
Copy link
Contributor

Thank you @cheinger for adding this optimization! I'm seeing a 15-30% reduction in compute time for our JOIN benchmarks as a result of this change.
image

@cheinger
Copy link
Contributor Author

@GregoryKimball Sweet! Happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Left Semi Join much slower than Inner join
7 participants