New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Optimize `left_semi_join` by materializing the gather mask #10511

Merged

rapids-bot merged 3 commits into rapidsai:branch-22.06 from cheinger:branch-22.06

May 5, 2022

Contributor

cheinger commented Mar 24, 2022 •

edited by jrhemstad

Loading

Closes #10464

Updates the left_semi_join to materialize the gather mask instead of generating it via a transform iterator.

Including the map.contains in the gather call reduced occupancy due to increasing register usage. As a result, explicitly materializing the gather mask is faster.

cheinger requested a review from a team as a code owner

March 24, 2022 20:02

cheinger requested review from trxcllnt and codereport

March 24, 2022 20:02

Collaborator

GPUtester commented Mar 24, 2022

Can one of the admins verify this patch?

github-actions bot added the libcudf label

cheinger mentioned this pull request

[BUG] Left Semi Join much slower than Inner join #10464

Closed

jrhemstad reviewed

View reviewed changes

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

PointKernel added improvement non-breaking labels


          Optimized left_semi_join

52df19b

Up to 20x faster. Separated hash table lookup from copy_if because
increased register usage significantly limited occupancy of this
kernel.

cheinger force-pushed the branch-22.06 branch from 0fafead to 52df19b Compare

April 4, 2022 03:24


          clang format

7f451e6

Contributor

jrhemstad commented Apr 5, 2022 •

edited

Loading

@cheinger so to be clear, the performance improvement didn't come from using cub::DeviceSelect::Flagged, but instead from pulling the map::contains function out of the copy_if by materializing the predicate as a separate array?

Contributor Author

cheinger commented Apr 5, 2022

@jrhemstad correct. I updated the gitlab issue with a more detailed explanation

Contributor

jrhemstad commented Apr 5, 2022

@cheinger could you update the PR description to provide a short summary? The PR description goes into the CHANGELOG.

Member

PointKernel commented Apr 5, 2022

ok to test

PointKernel reviewed

View reviewed changes

Member

PointKernel left a comment •

edited

Loading

Nice work. Can you please update the PR title accordingly? It would be useful to also include your performance analysis (here) in the PR description. Did you notice any performance changes in semi join benchmarks?

cpp/src/join/semi_join.cu Show resolved Hide resolved

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved

PointKernel added the Performance label

Contributor

sevagh commented Apr 5, 2022

ok to test

1 similar comment

Contributor

sevagh commented Apr 5, 2022

ok to test

This comment was marked as outdated.

Sign in to view

jrhemstad approved these changes

View reviewed changes

Contributor

jrhemstad commented May 5, 2022

@PointKernel can you re-review/approve?


          Update cpp/src/join/semi_join.cu

5e0f726

PointKernel changed the title ~~Optimize left_semi_join by using cub::DeviceSelect::Flagged instead of thrust::copy_if~~ Optimize left_semi_join by materializing the gather mask

Contributor

jrhemstad commented May 5, 2022

add to whitelist

Member

ajschmidt8 commented May 5, 2022

add to allowlist

Member

PointKernel commented May 5, 2022

rerun tests

PointKernel approved these changes

View reviewed changes

Member

PointKernel commented May 5, 2022

@gpucibot merge

rapids-bot bot merged commit ee26fbe into rapidsai:branch-22.06

jrhemstad mentioned this pull request

Improve distinct by using cuco::static_map::retrieve_all #10916

Merged

Contributor

GregoryKimball commented May 31, 2022

Thank you @cheinger for adding this optimization! I'm seeing a 15-30% reduction in compute time for our JOIN benchmarks as a result of this change.

Contributor Author

cheinger commented May 31, 2022

@GregoryKimball Sweet! Happy to help!

ttnghia mentioned this pull request

Refactor semi_anti_join #11100

Merged

ttnghia mentioned this pull request

Support nth_element for window functions #11158

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jrhemstad jrhemstad approved these changes

PointKernel PointKernel approved these changes

trxcllnt Awaiting requested review from trxcllnt trxcllnt is a code owner automatically assigned from rapidsai/cudf-cpp-codeowners

codereport Awaiting requested review from codereport codereport is a code owner automatically assigned from rapidsai/cudf-cpp-codeowners

Labels

improvement libcudf non-breaking Performance