[NVPTX] Fix 64 bits rotations with large shift values #89399

npmiller · 2024-04-19T15:14:48Z

ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size.

This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.

cc @Artem-B

Artem-B

LGTM overall.

Artem-B · 2024-04-19T17:55:07Z

llvm/test/CodeGen/NVPTX/rotate.ll

+declare i64 @llvm.fshr.i64(i64, i64, i64)
+
+; SM35: rotl64
+define i64 @rotl64(i64 %a, i64 %n) {


This test should probably be converted to use llvm/utils/update_llc_test_checks.py
We do care about the arguments and the exact instruction sequences here.

^^^ we still want to improve the test.

Oh right! Updated with the script now, it was confusing for a little bit because it doesn't understand -march so I had to swap to --mtriple.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

npmiller · 2024-04-30T09:00:10Z

@Artem-B I don't have commit permissions, would you mind landing this? I'll try to follow up with the suggestions when I have time.

ROTL and ROTR can take a shift amount larger than the element size, in which case the effective shift amount should be the shift amount modulo the element size. This patch adds the modulo step when the shift amount isn't known at compile time. Without it the existing implementation would end up shifting beyond the type size and give incorrect results.

Artem-B

LGTM.

Artem-B · 2024-05-01T17:11:59Z

llvm/test/CodeGen/NVPTX/rotate.ll

+; SM35-NEXT:    ld.param.u64 %rd1, [rotate64_param_0];
+; SM35-NEXT:    {
+; SM35-NEXT:    .reg .b32 %dummy;
+; SM35-NEXT:    mov.b64 {%dummy,%r1}, %rd1;
+; SM35-NEXT:    }
+; SM35-NEXT:    {
+; SM35-NEXT:    .reg .b32 %dummy;
+; SM35-NEXT:    mov.b64 {%r2,%dummy}, %rd1;
+; SM35-NEXT:    }


Looks like a minor optimization opportunity for the future -- this could be done as mov.b64 {%r2, %r1}, %rd1.

Artem-B approved these changes Apr 19, 2024

View reviewed changes

npmiller force-pushed the fix-rot-shifts branch 2 times, most recently from 38eca33 to 138f196 Compare April 29, 2024 15:55

npmiller force-pushed the fix-rot-shifts branch from 138f196 to 93b07c4 Compare May 1, 2024 16:57

Artem-B approved these changes May 1, 2024

View reviewed changes

Artem-B merged commit 7396ab1 into llvm:main May 1, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVPTX] Fix 64 bits rotations with large shift values #89399

[NVPTX] Fix 64 bits rotations with large shift values #89399

npmiller commented Apr 19, 2024

Artem-B left a comment

Artem-B Apr 19, 2024

Artem-B Apr 30, 2024

npmiller May 1, 2024

npmiller commented Apr 30, 2024

Artem-B left a comment

Artem-B May 1, 2024

[NVPTX] Fix 64 bits rotations with large shift values #89399

[NVPTX] Fix 64 bits rotations with large shift values #89399

Conversation

npmiller commented Apr 19, 2024

Artem-B left a comment

Choose a reason for hiding this comment

Artem-B Apr 19, 2024

Choose a reason for hiding this comment

Artem-B Apr 30, 2024

Choose a reason for hiding this comment

npmiller May 1, 2024

Choose a reason for hiding this comment

npmiller commented Apr 30, 2024

Artem-B left a comment

Choose a reason for hiding this comment

Artem-B May 1, 2024

Choose a reason for hiding this comment