Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll back ipow changes due to register pressure. #15242

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

pmattione-nvidia
Copy link
Contributor

@pmattione-nvidia pmattione-nvidia commented Mar 6, 2024

The addition of an array of integers in this function placed too much register pressure on our code base. This function is used by the fixed_point constructor and cast operators, so it potentially affects every kernel. Too many unrelated kernels were impacted and suffered performance degradations to justify this change. This reverts the algorithm introduced in #15110 to what it was previously, with some very minor tweaks.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pmattione-nvidia pmattione-nvidia added libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 6, 2024
@pmattione-nvidia pmattione-nvidia requested a review from a team as a code owner March 6, 2024 19:22
Copy link

copy-pr-bot bot commented Mar 6, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@GregoryKimball
Copy link
Contributor

Adding @shrshi and @PointKernel as reviewers based on their previous analysis of #15110

@PointKernel
Copy link
Member

@pmattione-nvidia I recall you mentioned that using a smaller lookup table can get about the same performance without introducing too much register pressure. How did that go?

@pmattione-nvidia
Copy link
Contributor Author

I tried that and everything I could think of, but several tests were still just too sensitive. It's not clear to me that they were even using decimal types at all, but they were slowing down significantly regardless (a few tests were still 5-10% slower).

@PointKernel
Copy link
Member

LGTM

I pinned the original PR in the PR description for easy backtrace. We can revisit the recursive solution once the mixed join has been refactored with the new cuco set (which is supposed to relieve the register pressure a bit). Also, I've sent you the instructions for setting up signed commits via slack. The CI will automatically once it's done.

@pmattione-nvidia
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This branch might need an upmerge.

@pmattione-nvidia
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 63c9ed7 into rapidsai:branch-24.04 Mar 11, 2024
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants