-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Fix 64 bits rotations with large shift values #89399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
declare i64 @llvm.fshr.i64(i64, i64, i64) | ||
|
||
; SM35: rotl64 | ||
define i64 @rotl64(i64 %a, i64 %n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should probably be converted to use llvm/utils/update_llc_test_checks.py
We do care about the arguments and the exact instruction sequences here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^^ we still want to improve the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right! Updated with the script now, it was confusing for a little bit because it doesn't understand -march
so I had to swap to --mtriple
.
38eca33
to
138f196
Compare
@Artem-B I don't have commit permissions, would you mind landing this? I'll try to follow up with the suggestions when I have time. |
ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size. This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
; SM35-NEXT: ld.param.u64 %rd1, [rotate64_param_0]; | ||
; SM35-NEXT: { | ||
; SM35-NEXT: .reg .b32 %dummy; | ||
; SM35-NEXT: mov.b64 {%dummy,%r1}, %rd1; | ||
; SM35-NEXT: } | ||
; SM35-NEXT: { | ||
; SM35-NEXT: .reg .b32 %dummy; | ||
; SM35-NEXT: mov.b64 {%r2,%dummy}, %rd1; | ||
; SM35-NEXT: } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a minor optimization opportunity for the future -- this could be done as mov.b64 {%r2, %r1}, %rd1
.
ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size.
This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.
cc @Artem-B