[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

chenx97 · 2023-11-03T04:53:33Z

Description:

When compiling the following PoC with LLVM, an unexpected behavior happens, where the i128 representation of -1 will be shifted as if it were a <2xi64> vector with each element shifted by the given amount. In better-supported architectures and in older versions of LLVM, the shifting of i128 behaves correctly on Mips targets.

Environment:

Compiler: LLVM 18, commit b120fe8
Target Architecture: mips64el-unknown-linux-gnuabi64
Optimization level: the issue can be observed from O0 to O3

PoC

#include <stdio.h>
#include <stdint.h>

extern __int128 owo(__int128, int64_t);

int main() {
    __int128 a = 0xabcdef00abcdef11;
    a <<= 64;
    a |= 0xabcdef22abcdef33;
    __int128 b = owo(a, 48);
    unsigned long *p = (void *)&b;
    printf("%lx%lx\n", p[1], p[0]);
    return 0;
}

target triple = "mips64el-unknown-linux-gnuabi64"

define i128 @owo(i128 %a, i64 %b) {
start:
  %tmp = sext i64 %b to i128
  %_1 = and i128 %tmp, 127
  %_0 = shl i128 %a, %_1
  ret i128 %_0
}

!1 = !{}

Expected Behavior:

The resulted i128 should be ef11abcdef22abcdef33000000000000

Observed Behavior:

The resulted i128 is ef11000000000000ef33000000000000

Analysis:

The result looks like that when shifting by a number less than 64, although both i64 numbers get correctly shifted, the overflown bits of the lower 64 bits are not copied to the lower part of the higher 64 bits. Note that the and operation in the IR only applies to the number of bits to shift by, which is smaller than the limit of i8 in this case.

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-11-03T05:10:42Z

@llvm/issue-subscribers-backend-mips

Author: Henry Chen (chenx97)

### Description:

When compiling the following PoC with LLVM, an unexpected behavior happens, where the i128 representation of -1 will be shifted as if it were a <2xi64> vector with each element shifted by the given amount. In better-supported architectures and in older versions of LLVM, the shifting of i128 behaves correctly on Mips targets.

Environment:

Compiler: LLVM 18, commit b120fe8
Target Architecture: mips64el-unknown-linux-gnuabi64
Optimization level: the issue can be observed from O0 to O3

PoC

#include &lt;stdio.h&gt;
#include &lt;stdint.h&gt;

extern __int128 owo(__int128, int64_t);

int main() {
    __int128 a = 0xabcdef00abcdef11;
    a &lt;&lt;= 64;
    a |= 0xabcdef22abcdef33;
    __int128 b = owo(a, 48);
    unsigned long *p = (void *)&amp;b;
    printf("%lx%lx\n", p[1], p[0]);
    return 0;
}

target triple = "mips64el-unknown-linux-gnuabi64"

define i128 @<!-- -->owo(i128 %a, i64 %b) {
start:
  %tmp = sext i64 %b to i128
  %_1 = and i128 %tmp, 127
  %_0 = shl i128 %a, %_1
  ret i128 %_0
}

!1 = !{}

Expected Behavior:

The resulted i128 should be ef11abcdef22abcdef33000000000000

Observed Behavior:

The resulted i128 is ef11000000000000ef33000000000000

Analysis:

The result looks like that when shifting by a number less than 64, although both i64 numbers get correctly shifted, the overflown bits of the lower 64 bits are not copied to the lower part of the higher 64 bits. Note that the and operation in the IR only applies to the number of bits to shift by, which is smaller than the limit of i8 in this case.

Cyanoxygen · 2023-11-03T05:10:53Z

Hi all,

This PoC is confirmed positive since LLVM 17.0.2 (not tested with 17.0.0 though).

The last known LLVM version which was confirmed negative is 16.0.6.

If we start with an i128 shift, the initial shift amount would usually have zeros in bit 8 and above. xoring the shift amount with -1 will set those upper bits to 1. If DAGCombiner is able to prove those bits are now 1, then the shift that uses the xor will be replaced with undef. Which we don't want. Reduce the xor constant to VT.bits-1 where VT is half the size of the larger shift type. This avoids toggling the upper bits. The hardware shift instruction only uses the lower bits of the shift amount. I assume the code used NOT because the hardware doesn't use the upper bits, but that isn't compatible with the LLVM poison semantics. Fixes llvm#71142.

Cyanoxygen · 2023-11-03T07:04:48Z

Hi,

My colleague mentioned that the patch proposed in issue #64794 (which did almost the same thing with the PR) might solve this. After some testing, the result looked good.

Thanks!

Cyanoxygen · 2023-11-03T07:12:10Z

I have a repo that contains some alternate PoC code at https://github.com/Cyanoxygen/llvm-mips-regression.
The problem was discovered by OP while OP was building Rust, we stripped down the PoC to the code in the issue body and in this repo.

The test result are here:

$ make CROSS_COMPILE=mips64el-linux-gnuabi64- TGT=mips64el-unknown-linux-gnuabi64 test
make -C src test
make[1]: Entering directory '/home/cyan/build/llvm-mips-regression/src'
mips64el-linux-gnuabi64-gcc -O0 -o owo main.c owo.ll.o -static
mips64el-linux-gnuabi64-gcc -O0 -o uwu main2.c uwu.ll.o -static
Running owo
92409240924092409240000000000000
Running uwu
92409240924092409240000000000000
make[1]: Leaving directory '/home/cyan/build/llvm-mips-regression/src'
$ vim src/main.c
$ vim src/main.ll # Replace owo with the code in the issue body
$ make CROSS_COMPILE=mips64el-linux-gnuabi64- TGT=mips64el-unknown-linux-gnuabi64 test
make -C src test
make[1]: Entering directory '/home/cyan/build/llvm-mips-regression/src'
llc --filetype=obj -O0 -mtriple=mips64el-unknown-linux-gnuabi64 -o owo.ll.o owo.ll
mips64el-linux-gnuabi64-gcc -O0 -o owo main.c owo.ll.o -static
mips64el-linux-gnuabi64-gcc -O0 -o uwu main2.c uwu.ll.o -static
Running owo
ef11abcdef22abcdef33000000000000
Running uwu
92409240924092409240000000000000
make[1]: Leaving directory '/home/cyan/build/llvm-mips-regression/src'

The value is correct across all our PoC codes.
Please note that the compiled binary was run with QEMU.

EDIT: updated the output of the second attempt, which replaced the code with the one in issue body.
EDIT: The patch in the PR worked too.

If we start with an i128 shift, the initial shift amount would usually have zeros in bit 8 and above. xoring the shift amount with -1 will set those upper bits to 1. If DAGCombiner is able to prove those bits are now 1, then the shift that uses the xor will be replaced with undef. Which we don't want. Reduce the xor constant to VT.bits-1 where VT is half the size of the larger shift type. This avoids toggling the upper bits. The hardware shift instruction only uses the lower bits of the shift amount. I assume the code used NOT because the hardware doesn't use the upper bits, but that isn't compatible with the LLVM poison semantics. Fixes #71142.

If we start with an i128 shift, the initial shift amount would usually have zeros in bit 8 and above. xoring the shift amount with -1 will set those upper bits to 1. If DAGCombiner is able to prove those bits are now 1, then the shift that uses the xor will be replaced with undef. Which we don't want. Reduce the xor constant to VT.bits-1 where VT is half the size of the larger shift type. This avoids toggling the upper bits. The hardware shift instruction only uses the lower bits of the shift amount. I assume the code used NOT because the hardware doesn't use the upper bits, but that isn't compatible with the LLVM poison semantics. Fixes llvm#71142. (cherry picked from commit llvm@8d24d39)

If we start with an i128 shift, the initial shift amount would usually have zeros in bit 8 and above. xoring the shift amount with -1 will set those upper bits to 1. If DAGCombiner is able to prove those bits are now 1, then the shift that uses the xor will be replaced with undef. Which we don't want. Reduce the xor constant to VT.bits-1 where VT is half the size of the larger shift type. This avoids toggling the upper bits. The hardware shift instruction only uses the lower bits of the shift amount. I assume the code used NOT because the hardware doesn't use the upper bits, but that isn't compatible with the LLVM poison semantics. Fixes #71142. (cherry picked from commit 8d24d39)

github-actions bot added the new issue label Nov 3, 2023

EugeneZelenko added backend:MIPS and removed new issue labels Nov 3, 2023

topperc self-assigned this Nov 3, 2023

topperc mentioned this issue Nov 3, 2023

[Mips] In LowerShift*Parts, xor with bits-1 instead of -1. #71149

Merged

dtcxzyw added the miscompilation label Nov 3, 2023

Cyanoxygen mentioned this issue Nov 3, 2023

64 bit shift is broken in nightly build for MIPS rust-lang/rust#116177

Closed

topperc closed this as completed in #71149 Nov 3, 2023

martn3 mentioned this issue Nov 8, 2023

Cherry-pick of upstreamed LLVM patch that fix MIPS int shift miscompilation rust-lang/llvm-project#156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

chenx97 commented Nov 3, 2023 •

edited

Loading

llvmbot commented Nov 3, 2023

Environment:

PoC

Expected Behavior:

Observed Behavior:

Analysis:

Cyanoxygen commented Nov 3, 2023 •

edited

Loading

Cyanoxygen commented Nov 3, 2023 •

edited

Loading

Cyanoxygen commented Nov 3, 2023 •

edited

Loading

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

Comments

chenx97 commented Nov 3, 2023 • edited Loading

Description:

Environment:

PoC

Expected Behavior:

Observed Behavior:

Analysis:

llvmbot commented Nov 3, 2023

Environment:

PoC

Expected Behavior:

Observed Behavior:

Analysis:

Cyanoxygen commented Nov 3, 2023 • edited Loading

Cyanoxygen commented Nov 3, 2023 • edited Loading

Cyanoxygen commented Nov 3, 2023 • edited Loading

chenx97 commented Nov 3, 2023 •

edited

Loading

Cyanoxygen commented Nov 3, 2023 •

edited

Loading

Cyanoxygen commented Nov 3, 2023 •

edited

Loading

Cyanoxygen commented Nov 3, 2023 •

edited

Loading