Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Left Semi Join much slower than Inner join #10464

Closed
cheinger opened this issue Mar 19, 2022 · 4 comments · Fixed by #10511
Closed

[BUG] Left Semi Join much slower than Inner join #10464

cheinger opened this issue Mar 19, 2022 · 4 comments · Fixed by #10511
Labels
bug Something isn't working

Comments

@cheinger
Copy link
Contributor

Describe the bug
I’ve noticed that Left Semi Join can be an order of magnitude slower than Inner Join. Below you can see the 10x slow down when we switch out Inner Join for Left Semi Join (same data & table sizes).

Inner Join
Screen Shot 2022-03-18 at 6 06 11 PM

Left Semi Join
Screen Shot 2022-03-18 at 6 06 06 PM

I did some further investigation and found that the cause for this slow down is the thrust::copy_if in left_semi_anti_join that copies the selected row indices that are found in the hash table. I switched this out for cub::DeviceSelect::Flagged and observed an 18x performance improvement as seen below:

Before
Screen Shot 2022-03-18 at 5 11 11 PM

After
Screen Shot 2022-03-18 at 5 11 41 PM

Expected behavior
I would expect Left Semi Join and Inner Join to be somewhat comparable in performance, and if anything, Inner Join to be slower because we have to handle duplicate keys.

Environment details
env_details.log

@cheinger cheinger added Needs Triage Need team to review and classify bug Something isn't working labels Mar 19, 2022
@cheinger cheinger changed the title [FEA] Left Semi Join much slower than Inner join [BUG] Left Semi Join much slower than Inner join Mar 19, 2022
@cheinger
Copy link
Contributor Author

I have created a PR with a fix: #10511

@jrhemstad
Copy link
Contributor

I'm more interested to know why thrust::copy_if is so much slower than cub::DeviceSelect::flagged. If they don't share the same implementation, then they should be made to.

@cheinger
Copy link
Contributor Author

cheinger commented Apr 5, 2022

@jrhemstad I did some further investigation and determined that the the reason thrust::copy_if was slower than cub::DeviceSelect::Flagged was actually because of the predicate lambda function we used in copy_if. The lambda performed the hash table lookup which significantly increased the register usage of the thrust::copy_if kernel and therefore reduced the occupancy of the kernel and furthermore the performance. So instead of using cub::DeviceSelect::Flagged I continue to use thrust::copy_if, but instead, the lambda just returns the corresponding index from the flagged array and now the performance was comparable to cub::DeviceSelect::Flagged. And below you can see the occupancy increased from 37% to 72%.

thrust::copy_if with original lambda
Screen Shot 2022-04-03 at 8 10 08 PM

thrust::copy_if with flagged lambda
Screen Shot 2022-04-03 at 8 10 27 PM

I have updated my PR to reflect these changes: #10511

@github-actions
Copy link

github-actions bot commented May 5, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue May 5, 2022
Closes #10464

Updates the `left_semi_join` to materialize the gather mask instead of generating it via a transform iterator.

Including the `map.contains` in the `gather` call reduced occupancy due to increasing register usage. As a result, explicitly materializing the gather mask is faster.

Authors:
  - Xavier Simmons (https://github.com/cheinger)
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #10511
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants