-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll back ipow changes due to register pressure. #15242
Conversation
Adding @shrshi and @PointKernel as reviewers based on their previous analysis of #15110 |
@pmattione-nvidia I recall you mentioned that using a smaller lookup table can get about the same performance without introducing too much register pressure. How did that go? |
I tried that and everything I could think of, but several tests were still just too sensitive. It's not clear to me that they were even using decimal types at all, but they were slowing down significantly regardless (a few tests were still 5-10% slower). |
LGTM I pinned the original PR in the PR description for easy backtrace. We can revisit the recursive solution once the mixed join has been refactored with the new cuco set (which is supposed to relieve the register pressure a bit). Also, I've sent you the instructions for setting up signed commits via slack. The CI will automatically once it's done. |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This branch might need an upmerge.
/merge |
The addition of an array of integers in this function placed too much register pressure on our code base. This function is used by the fixed_point constructor and cast operators, so it potentially affects every kernel. Too many unrelated kernels were impacted and suffered performance degradations to justify this change. This reverts the algorithm introduced in #15110 to what it was previously, with some very minor tweaks.
Checklist