miscompile with -O2 (suspected incorrect loop trip count computation) #23169

kcc · 2015-03-04T21:47:03Z


Bugzilla Link	22795
Resolution	FIXED
Resolved on	Mar 12, 2015 20:53
Version	unspecified
OS	Linux
CC	@hfinkel,@rotateright

Extended Description

% cat z.cc
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(int argc, char **argv) {
const int n = 16;
int64_t bins[n] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
for (int ix = 0; ix < n; ++ix)
for (int jx = 0; jx < ix; ++jx)
bins[ix]++;
fprintf(stderr, "%ld\n", bins[10]);
}
% clang++ -O1 z.cc && ./a.out
10
% clang++ -O2 z.cc && ./a.out
4294967306
%

Something is messed up in computation of the loop trip count.
This is r231268 on x86_64 linux.

kcc · 2015-03-10T22:43:54Z

anyone?

rotateright · 2015-03-11T06:28:55Z

Looks like a bug in LSR.

This simplified IR looks ok to me:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"

define i32 @main() {
entry:
%bins = alloca [16 x i64], align 16
%0 = bitcast [16 x i64]* %bins to i8*
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 128, i32 16, i1 false)
br label %preheader

preheader: ; preds = %for.inc.1, %entry
%v11 = phi i64 [ 0, %entry ], [ %next12.1, %for.inc.1 ]
%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
%cmp = icmp sgt i64 %v11, 0
br i1 %cmp, label %for.body, label %for.inc

for.body: ; preds = %preheader
%zext = zext i32 %iv to i64
%arrayidx = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 %v11
%loaded = load i64, i64* %arrayidx, align 8
%add = add i64 %loaded, 1
%add2 = add i64 %add, %zext
store i64 %add2, i64* %arrayidx, align 8
br label %for.inc

for.inc: ; preds = %for.body, %preheader
%next12 = add nuw nsw i64 %v11, 1
%next = add nsw i32 %iv, 1
br i1 true, label %for.body.1, label %for.inc.1

end: ; preds = %for.inc.1
%arrayidx8 = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 2
%load = load i64, i64* %arrayidx8, align 16
%shr4 = lshr i64 %load, 32
%conv = trunc i64 %shr4 to i32
ret i32 %conv

for.body.1: ; preds = %for.inc
%zext.1 = zext i32 %next to i64
%arrayidx.1 = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 %next12
%loaded.1 = load i64, i64* %arrayidx.1, align 8
%add.1 = add i64 %loaded.1, 1
%add2.1 = add i64 %add.1, %zext.1
store i64 %add2.1, i64* %arrayidx.1, align 8
br label %for.inc.1

for.inc.1: ; preds = %for.body.1, %for.inc
%next12.1 = add nuw nsw i64 %next12, 1
%next.1 = add nuw nsw i32 %next, 1
%exitcond.1 = icmp eq i64 %next12.1, 16
br i1 %exitcond.1, label %end, label %preheader
}

; Function Attrs: nounwind
declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) #0

rotateright · 2015-03-11T06:29:36Z

$ ./llc temp.ll -o - | ./clang -x assembler - ; ./a.out ; echo $?
1
$ ./llc -disable-lsr temp.ll -o - | ./clang -x assembler - ; ./a.out ; echo $?
0

llvmbot · 2015-03-13T01:08:17Z

I think SCEV is feeding bad data to LSR. From "opt -analyze -scalar-evolution" on your reduced testcase:

%zext = zext i32 %iv to i64
--> {4294967295,+,2}<%preheader> Exits: 4294967309

This is:

%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
%next = add nsw i32 %iv, 1
%next.1 = add nuw nsw i32 %next, 1

If you remove the 'nsw' and 'nuw' parts from the .ll, you get:

%zext = zext i32 %iv to i64
--> (zext i32 {-1,+,2}<%preheader> to i64) Exits: 13

I think the nsw and nuw bits are all correct in the .ll, it's something inside SCEV going wrong with wrapping handling.

Reasoning I think the no-wrap bits are correct: %next is correct, its first computation is "-1 + 1" which unsigned-wraps but does not signed-wrap. It then does [0..12]+1 none of which wrap in any way. %next.1 is correct, its first add is "0 + 1" which doesn't wrap, and it goes through [0..13]+1, none of which wrap.

llvmbot · 2015-03-13T02:00:01Z

Oh, the problem isn't %zext, it's %iv:

%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
--> {-1,+,2}<%preheader> Exits: 13

That's not right. Starting -1, then iteration 1 it goes "-1 + 2" which clearly violates nuw.

Here's the bug in createNodeForPHI:

3531 // If the increment doesn't overflow, then neither the add
rec nor
3532 // the post-increment will overflow.
3533 if (const AddOperator *OBO = dyn_cast(BEValue
V)) {
3534 if (OBO->hasNoUnsignedWrap())
3535 Flags = setFlags(Flags, SCEV::FlagNUW);
3536 if (OBO->hasNoSignedWrap())
3537 Flags = setFlags(Flags, SCEV::FlagNSW);

OBO in this case is "%next12.1 = add nuw nsw i64 %next12, 1" but that's not enough because OBO is just the tail of a chain of computation; OBO->getOperand(0) != PN.

llvmbot · 2015-03-13T03:53:41Z

Fixed in r232134.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

DavidSpickett mentioned this issue May 3, 2024

lldb-server --min/max-gdbserver-port port map is not shared between processes on Linux #47549

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

kcc commented Mar 4, 2015

kcc commented Mar 10, 2015

rotateright commented Mar 11, 2015

rotateright commented Mar 11, 2015

llvmbot commented Mar 13, 2015

llvmbot commented Mar 13, 2015

llvmbot commented Mar 13, 2015

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

Comments

kcc commented Mar 4, 2015

Extended Description

kcc commented Mar 10, 2015

rotateright commented Mar 11, 2015

rotateright commented Mar 11, 2015

llvmbot commented Mar 13, 2015

llvmbot commented Mar 13, 2015

llvmbot commented Mar 13, 2015