Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

Closed
chenx97 opened this issue Nov 3, 2023 · 4 comments · Fixed by #71149
Closed

[Mips] Incorrect code generated to perform i128 shifting by a truncated amount #71142

chenx97 opened this issue Nov 3, 2023 · 4 comments · Fixed by #71149

Comments

@chenx97
Copy link

chenx97 commented Nov 3, 2023

Description:

When compiling the following PoC with LLVM, an unexpected behavior happens, where the i128 representation of -1 will be shifted as if it were a <2xi64> vector with each element shifted by the given amount. In better-supported architectures and in older versions of LLVM, the shifting of i128 behaves correctly on Mips targets.

Environment:

  • Compiler: LLVM 18, commit b120fe8
  • Target Architecture: mips64el-unknown-linux-gnuabi64
  • Optimization level: the issue can be observed from O0 to O3

PoC

#include <stdio.h>
#include <stdint.h>

extern __int128 owo(__int128, int64_t);

int main() {
    __int128 a = 0xabcdef00abcdef11;
    a <<= 64;
    a |= 0xabcdef22abcdef33;
    __int128 b = owo(a, 48);
    unsigned long *p = (void *)&b;
    printf("%lx%lx\n", p[1], p[0]);
    return 0;
}
target triple = "mips64el-unknown-linux-gnuabi64"

define i128 @owo(i128 %a, i64 %b) {
start:
  %tmp = sext i64 %b to i128
  %_1 = and i128 %tmp, 127
  %_0 = shl i128 %a, %_1
  ret i128 %_0
}

!1 = !{}

Expected Behavior:

The resulted i128 should be ef11abcdef22abcdef33000000000000

Observed Behavior:

The resulted i128 is ef11000000000000ef33000000000000

Analysis:

The result looks like that when shifting by a number less than 64, although both i64 numbers get correctly shifted, the overflown bits of the lower 64 bits are not copied to the lower part of the higher 64 bits. Note that the and operation in the IR only applies to the number of bits to shift by, which is smaller than the limit of i8 in this case.

@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2023

@llvm/issue-subscribers-backend-mips

Author: Henry Chen (chenx97)

### Description:

When compiling the following PoC with LLVM, an unexpected behavior happens, where the i128 representation of -1 will be shifted as if it were a <2xi64> vector with each element shifted by the given amount. In better-supported architectures and in older versions of LLVM, the shifting of i128 behaves correctly on Mips targets.

Environment:

  • Compiler: LLVM 18, commit b120fe8
  • Target Architecture: mips64el-unknown-linux-gnuabi64
  • Optimization level: the issue can be observed from O0 to O3

PoC

#include &lt;stdio.h&gt;
#include &lt;stdint.h&gt;

extern __int128 owo(__int128, int64_t);

int main() {
    __int128 a = 0xabcdef00abcdef11;
    a &lt;&lt;= 64;
    a |= 0xabcdef22abcdef33;
    __int128 b = owo(a, 48);
    unsigned long *p = (void *)&amp;b;
    printf("%lx%lx\n", p[1], p[0]);
    return 0;
}
target triple = "mips64el-unknown-linux-gnuabi64"

define i128 @<!-- -->owo(i128 %a, i64 %b) {
start:
  %tmp = sext i64 %b to i128
  %_1 = and i128 %tmp, 127
  %_0 = shl i128 %a, %_1
  ret i128 %_0
}

!1 = !{}

Expected Behavior:

The resulted i128 should be ef11abcdef22abcdef33000000000000

Observed Behavior:

The resulted i128 is ef11000000000000ef33000000000000

Analysis:

The result looks like that when shifting by a number less than 64, although both i64 numbers get correctly shifted, the overflown bits of the lower 64 bits are not copied to the lower part of the higher 64 bits. Note that the and operation in the IR only applies to the number of bits to shift by, which is smaller than the limit of i8 in this case.

@Cyanoxygen
Copy link
Contributor

Cyanoxygen commented Nov 3, 2023

Hi all,

This PoC is confirmed positive since LLVM 17.0.2 (not tested with 17.0.0 though).

The last known LLVM version which was confirmed negative is 16.0.6.

@topperc topperc self-assigned this Nov 3, 2023
topperc added a commit to topperc/llvm-project that referenced this issue Nov 3, 2023
If we start with an i128 shift, the initial shift amount would
usually have zeros in bit 8 and above. xoring the shift amount with
-1 will set those upper bits to 1. If DAGCombiner is able to prove those
bits are now 1, then the shift that uses the xor will be replaced
with undef. Which we don't want.

Reduce the xor constant to VT.bits-1 where VT is half the size of
the larger shift type. This avoids toggling the upper bits. The
hardware shift instruction only uses the lower bits of the shift
amount. I assume the code used NOT because the hardware doesn't
use the upper bits, but that isn't compatible with the LLVM poison
semantics.

Fixes llvm#71142.
@Cyanoxygen
Copy link
Contributor

Cyanoxygen commented Nov 3, 2023

Hi,

My colleague mentioned that the patch proposed in issue #64794 (which did almost the same thing with the PR) might solve this. After some testing, the result looked good.

Thanks!

@Cyanoxygen
Copy link
Contributor

Cyanoxygen commented Nov 3, 2023

I have a repo that contains some alternate PoC code at https://github.com/Cyanoxygen/llvm-mips-regression.
The problem was discovered by OP while OP was building Rust, we stripped down the PoC to the code in the issue body and in this repo.

The test result are here:

$ make CROSS_COMPILE=mips64el-linux-gnuabi64- TGT=mips64el-unknown-linux-gnuabi64 test
make -C src test
make[1]: Entering directory '/home/cyan/build/llvm-mips-regression/src'
mips64el-linux-gnuabi64-gcc -O0 -o owo main.c owo.ll.o -static
mips64el-linux-gnuabi64-gcc -O0 -o uwu main2.c uwu.ll.o -static
Running owo
92409240924092409240000000000000
Running uwu
92409240924092409240000000000000
make[1]: Leaving directory '/home/cyan/build/llvm-mips-regression/src'
$ vim src/main.c
$ vim src/main.ll # Replace owo with the code in the issue body
$ make CROSS_COMPILE=mips64el-linux-gnuabi64- TGT=mips64el-unknown-linux-gnuabi64 test
make -C src test
make[1]: Entering directory '/home/cyan/build/llvm-mips-regression/src'
llc --filetype=obj -O0 -mtriple=mips64el-unknown-linux-gnuabi64 -o owo.ll.o owo.ll
mips64el-linux-gnuabi64-gcc -O0 -o owo main.c owo.ll.o -static
mips64el-linux-gnuabi64-gcc -O0 -o uwu main2.c uwu.ll.o -static
Running owo
ef11abcdef22abcdef33000000000000
Running uwu
92409240924092409240000000000000
make[1]: Leaving directory '/home/cyan/build/llvm-mips-regression/src'

The value is correct across all our PoC codes.
Please note that the compiled binary was run with QEMU.

EDIT: updated the output of the second attempt, which replaced the code with the one in issue body.
EDIT: The patch in the PR worked too.

topperc added a commit that referenced this issue Nov 3, 2023
If we start with an i128 shift, the initial shift amount would usually
have zeros in bit 8 and above. xoring the shift amount with -1 will set
those upper bits to 1. If DAGCombiner is able to prove those bits are
now 1, then the shift that uses the xor will be replaced with undef.
Which we don't want.

Reduce the xor constant to VT.bits-1 where VT is half the size of the
larger shift type. This avoids toggling the upper bits. The hardware
shift instruction only uses the lower bits of the shift amount. I assume
the code used NOT because the hardware doesn't use the upper bits, but
that isn't compatible with the LLVM poison semantics.

Fixes #71142.
martn3 pushed a commit to martn3/llvm-project that referenced this issue Nov 7, 2023
If we start with an i128 shift, the initial shift amount would usually
have zeros in bit 8 and above. xoring the shift amount with -1 will set
those upper bits to 1. If DAGCombiner is able to prove those bits are
now 1, then the shift that uses the xor will be replaced with undef.
Which we don't want.

Reduce the xor constant to VT.bits-1 where VT is half the size of the
larger shift type. This avoids toggling the upper bits. The hardware
shift instruction only uses the lower bits of the shift amount. I assume
the code used NOT because the hardware doesn't use the upper bits, but
that isn't compatible with the LLVM poison semantics.

Fixes llvm#71142.

(cherry picked from commit
llvm@8d24d39)
tru pushed a commit that referenced this issue Nov 13, 2023
If we start with an i128 shift, the initial shift amount would usually
have zeros in bit 8 and above. xoring the shift amount with -1 will set
those upper bits to 1. If DAGCombiner is able to prove those bits are
now 1, then the shift that uses the xor will be replaced with undef.
Which we don't want.

Reduce the xor constant to VT.bits-1 where VT is half the size of the
larger shift type. This avoids toggling the upper bits. The hardware
shift instruction only uses the lower bits of the shift amount. I assume
the code used NOT because the hardware doesn't use the upper bits, but
that isn't compatible with the LLVM poison semantics.

Fixes #71142.

(cherry picked from commit 8d24d39)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants