Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

Closed
kcc opened this issue Mar 4, 2015 · 6 comments
Closed

miscompile with -O2 (suspected incorrect loop trip count computation) #23169

kcc opened this issue Mar 4, 2015 · 6 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@kcc
Copy link
Contributor

kcc commented Mar 4, 2015

Bugzilla Link 22795
Resolution FIXED
Resolved on Mar 12, 2015 20:53
Version unspecified
OS Linux
CC @hfinkel,@rotateright

Extended Description

% cat z.cc
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(int argc, char **argv) {
const int n = 16;
int64_t bins[n] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
for (int ix = 0; ix < n; ++ix)
for (int jx = 0; jx < ix; ++jx)
bins[ix]++;
fprintf(stderr, "%ld\n", bins[10]);
}
% clang++ -O1 z.cc && ./a.out
10
% clang++ -O2 z.cc && ./a.out
4294967306
%

Something is messed up in computation of the loop trip count.
This is r231268 on x86_64 linux.

@kcc
Copy link
Contributor Author

kcc commented Mar 10, 2015

anyone?

@rotateright
Copy link
Contributor

Looks like a bug in LSR.

This simplified IR looks ok to me:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"

define i32 @​main() {
entry:
%bins = alloca [16 x i64], align 16
%0 = bitcast [16 x i64]* %bins to i8*
call void @​llvm.memset.p0i8.i64(i8* %0, i8 0, i64 128, i32 16, i1 false)
br label %preheader

preheader: ; preds = %for.inc.1, %entry
%v11 = phi i64 [ 0, %entry ], [ %next12.1, %for.inc.1 ]
%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
%cmp = icmp sgt i64 %v11, 0
br i1 %cmp, label %for.body, label %for.inc

for.body: ; preds = %preheader
%zext = zext i32 %iv to i64
%arrayidx = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 %v11
%loaded = load i64, i64* %arrayidx, align 8
%add = add i64 %loaded, 1
%add2 = add i64 %add, %zext
store i64 %add2, i64* %arrayidx, align 8
br label %for.inc

for.inc: ; preds = %for.body, %preheader
%next12 = add nuw nsw i64 %v11, 1
%next = add nsw i32 %iv, 1
br i1 true, label %for.body.1, label %for.inc.1

end: ; preds = %for.inc.1
%arrayidx8 = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 2
%load = load i64, i64* %arrayidx8, align 16
%shr4 = lshr i64 %load, 32
%conv = trunc i64 %shr4 to i32
ret i32 %conv

for.body.1: ; preds = %for.inc
%zext.1 = zext i32 %next to i64
%arrayidx.1 = getelementptr [16 x i64], [16 x i64]* %bins, i64 0, i64 %next12
%loaded.1 = load i64, i64* %arrayidx.1, align 8
%add.1 = add i64 %loaded.1, 1
%add2.1 = add i64 %add.1, %zext.1
store i64 %add2.1, i64* %arrayidx.1, align 8
br label %for.inc.1

for.inc.1: ; preds = %for.body.1, %for.inc
%next12.1 = add nuw nsw i64 %next12, 1
%next.1 = add nuw nsw i32 %next, 1
%exitcond.1 = icmp eq i64 %next12.1, 16
br i1 %exitcond.1, label %end, label %preheader
}

; Function Attrs: nounwind
declare void @​llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) #​0

@rotateright
Copy link
Contributor

$ ./llc temp.ll -o - | ./clang -x assembler - ; ./a.out ; echo $?
1
$ ./llc -disable-lsr temp.ll -o - | ./clang -x assembler - ; ./a.out ; echo $?
0

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2015

I think SCEV is feeding bad data to LSR. From "opt -analyze -scalar-evolution" on your reduced testcase:

%zext = zext i32 %iv to i64
--> {4294967295,+,2}<%preheader> Exits: 4294967309

This is:

%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
%next = add nsw i32 %iv, 1
%next.1 = add nuw nsw i32 %next, 1

If you remove the 'nsw' and 'nuw' parts from the .ll, you get:

%zext = zext i32 %iv to i64
--> (zext i32 {-1,+,2}<%preheader> to i64) Exits: 13

I think the nsw and nuw bits are all correct in the .ll, it's something inside SCEV going wrong with wrapping handling.

Reasoning I think the no-wrap bits are correct: %next is correct, its first computation is "-1 + 1" which unsigned-wraps but does not signed-wrap. It then does [0..12]+1 none of which wrap in any way. %next.1 is correct, its first add is "0 + 1" which doesn't wrap, and it goes through [0..13]+1, none of which wrap.

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2015

Oh, the problem isn't %zext, it's %iv:

%iv = phi i32 [ -1, %entry ], [ %next.1, %for.inc.1 ]
--> {-1,+,2}<%preheader> Exits: 13

That's not right. Starting -1, then iteration 1 it goes "-1 + 2" which clearly violates nuw.

Here's the bug in createNodeForPHI:

3531 // If the increment doesn't overflow, then neither the add
rec nor
3532 // the post-increment will overflow.
3533 if (const AddOperator *OBO = dyn_cast(BEValue
V)) {
3534 if (OBO->hasNoUnsignedWrap())
3535 Flags = setFlags(Flags, SCEV::FlagNUW);
3536 if (OBO->hasNoSignedWrap())
3537 Flags = setFlags(Flags, SCEV::FlagNSW);

OBO in this case is "%next12.1 = add nuw nsw i64 %next12, 1" but that's not enough because OBO is just the tail of a chain of computation; OBO->getOperand(0) != PN.

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2015

Fixed in r232134.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants