Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a special GC AS for array ptrs #28251

Merged
merged 1 commit into from
Jul 28, 2018
Merged

Add a special GC AS for array ptrs #28251

merged 1 commit into from
Jul 28, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 24, 2018

The array data pointer is somewhat special. It points to a chunk
for memory that is effectively managed by the GC, but is not itself
a GC-tracked value. However, it is also not quite an interior pointer
into the array, since it may be an external allocation (or at the
more immediate IR level it is derived using a load rather than
a gep). We could try to make Derived do both, but the semantics
turn out to be rather different, so add a new kind of AS Loaded,
that handles precisely this situation: It roots the object that it
was loaded from while it is live.

Fixes #27955

Current status: Seems to fix the issue, but seems to trigger LLVM bugs (optimizations introducing illegal ptrtoint).

@ararslan ararslan added GC Garbage collector bugfix This change fixes an existing bug labels Jul 24, 2018
@Keno
Copy link
Member Author

Keno commented Jul 24, 2018

Bugpointed test case for the LLVM bug:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-878a985.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

%jl_value_t = type opaque
%jl_array_t = type { i8 addrspace(13)*, i64, i16, i16, i32 }

declare i64 @julia_steprange_last_4949()

define void @"japi1_align!_9477"(%jl_value_t addrspace(10)**) #0 {
top:
  %1 = getelementptr inbounds %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %0, i64 0
  %2 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, !nonnull !1, !dereferenceable !2, !align !3
  %3 = getelementptr inbounds i8, i8 addrspace(11)* null, i64 12
  %4 = bitcast i8 addrspace(11)* %3 to i32 addrspace(11)*
  %5 = load i32, i32 addrspace(11)* %4, align 4, !tbaa !4
  %6 = sub i32 0, %5
  %7 = call i64 @julia_steprange_last_4949()
  br label %L26

L26:                                              ; preds = %L26, %top
  %value_phi3 = phi i64 [ 0, %top ], [ %spec.select, %L26 ]
  %8 = addrspacecast %jl_value_t addrspace(10)* %2 to %jl_value_t addrspace(11)*
  %9 = bitcast %jl_value_t addrspace(11)* %8 to %jl_value_t addrspace(10)* addrspace(11)*
  %10 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %9, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %11 = sub i64 %value_phi3, 1
  %12 = mul i64 %11, 1
  %13 = add i64 0, %12
  %14 = addrspacecast %jl_value_t addrspace(10)* %10 to %jl_value_t addrspace(11)*
  %15 = bitcast %jl_value_t addrspace(11)* %14 to %jl_array_t addrspace(11)*
  %16 = getelementptr inbounds %jl_array_t, %jl_array_t addrspace(11)* %15, i32 0, i32 0
  %17 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %16, !tbaa !10, !nonnull !1
  %18 = bitcast i8 addrspace(13)* %17 to i32 addrspace(13)*
  %19 = getelementptr inbounds i32, i32 addrspace(13)* %18, i64 %13
  %20 = load i32, i32 addrspace(13)* %19, align 4, !tbaa !13
  %21 = addrspacecast %jl_value_t addrspace(10)* %2 to %jl_value_t addrspace(11)*
  %22 = bitcast %jl_value_t addrspace(11)* %21 to %jl_value_t addrspace(10)* addrspace(11)*
  %23 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %22, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %24 = sext i32 %6 to i64
  %25 = add i64 %value_phi3, %24
  %26 = sub i64 %25, 1
  %27 = mul i64 %26, 1
  %28 = add i64 0, %27
  %29 = addrspacecast %jl_value_t addrspace(10)* %23 to %jl_value_t addrspace(11)*
  %30 = bitcast %jl_value_t addrspace(11)* %29 to %jl_array_t addrspace(11)*
  %31 = getelementptr inbounds %jl_array_t, %jl_array_t addrspace(11)* %30, i32 0, i32 0
  %32 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %31, !tbaa !10, !nonnull !1
  %33 = bitcast i8 addrspace(13)* %32 to i32 addrspace(13)*
  %34 = getelementptr inbounds i32, i32 addrspace(13)* %33, i64 %28
  store i32 %20, i32 addrspace(13)* %34, align 4, !tbaa !13
  %35 = icmp eq i64 %value_phi3, %7
  %36 = zext i1 %35 to i8
  %37 = trunc i8 %36 to i1
  %38 = xor i1 %37, true
  %39 = add i64 %value_phi3, -1
  %spec.select = select i1 %38, i64 %39, i64 undef
  %spec.select1 = select i1 %38, i8 0, i8 1
  %40 = xor i8 %spec.select1, 1
  %41 = trunc i8 %40 to i1
  %42 = xor i1 %41, true
  br i1 %42, label %L45, label %L26

L45:                                              ; preds = %L26
  ret void
}

attributes #0 = { "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{}
!2 = !{i64 16}
!3 = !{i64 8}
!4 = !{!5, !5, i64 0}
!5 = !{!"jtbaa_mutab", !6, i64 0}
!6 = !{!"jtbaa_value", !7, i64 0}
!7 = !{!"jtbaa_data", !8, i64 0}
!8 = !{!"jtbaa"}
!9 = !{i64 40}
!10 = !{!11, !11, i64 0}
!11 = !{!"jtbaa_arrayptr", !12, i64 0}
!12 = !{!"jtbaa_array", !8, i64 0}
!13 = !{!14, !14, i64 0}
!14 = !{!"jtbaa_arraybuf", !7, i64 0}

@Keno
Copy link
Member Author

Keno commented Jul 24, 2018

Introduced by the vectorizer:

vector.scevcheck:                                 ; preds = %top
  %14 = sub i64 0, %4
  %15 = add i64 %12, -1
  %scevgep = getelementptr i32, i32 addrspace(13)* %10, i64 %15
  %scevgep2 = ptrtoint i32 addrspace(13)* %scevgep to i64
  %mul = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 %14)
  %mul.result = extractvalue { i64, i1 } %mul, 0
  %mul.overflow = extractvalue { i64, i1 } %mul, 1
  %16 = add i64 %scevgep2, %mul.result
  %17 = sub i64 %scevgep2, %mul.result
  %18 = icmp ugt i64 %17, %scevgep2
  %19 = icmp ult i64 %16, %scevgep2
  %20 = select i1 true, i1 %18, i1 %19
  %21 = or i1 %20, %mul.overflow
  %22 = or i1 false, %21
  %scevgep3 = getelementptr i32, i32 addrspace(13)* %10, i64 -1
  %scevgep34 = ptrtoint i32 addrspace(13)* %scevgep3 to i64
  %mul5 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 %14)
  %mul.result6 = extractvalue { i64, i1 } %mul5, 0
  %mul.overflow7 = extractvalue { i64, i1 } %mul5, 1
  %23 = add i64 %scevgep34, %mul.result6
  %24 = sub i64 %scevgep34, %mul.result6
  %25 = icmp ugt i64 %24, %scevgep34
  %26 = icmp ult i64 %23, %scevgep34
  %27 = select i1 true, i1 %25, i1 %26
  %28 = or i1 %27, %mul.overflow7
  %29 = or i1 %22, %28
  br i1 %29, label %scalar.ph, label %vector.memcheck

@Keno
Copy link
Member Author

Keno commented Jul 24, 2018

loop-vectorize only test case:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-878a985.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

%jl_value_t = type opaque
%jl_array_t = type { i8 addrspace(13)*, i64, i16, i16, i32 }

declare i64 @julia_steprange_last_4949()

define void @"japi1_align!_9477"(%jl_value_t addrspace(10)**) #0 {
top:
  %1 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %0, align 8, !nonnull !1, !dereferenceable !2, !align !3
  %2 = load i32, i32* inttoptr (i64 12 to i32*), align 4, !tbaa !4
  %3 = sub i32 0, %2
  %4 = call i64 @julia_steprange_last_4949()
  %5 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to %jl_value_t addrspace(10)* addrspace(11)*
  %7 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %6, align 8, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %8 = addrspacecast %jl_value_t addrspace(10)* %7 to %jl_value_t addrspace(11)*
  %9 = bitcast %jl_value_t addrspace(11)* %8 to i32 addrspace(13)* addrspace(11)*
  %10 = load i32 addrspace(13)*, i32 addrspace(13)* addrspace(11)* %9, align 8, !tbaa !10, !nonnull !1
  %11 = sext i32 %3 to i64
  br label %L26

L26:                                              ; preds = %L26, %top
  %value_phi3 = phi i64 [ 0, %top ], [ %12, %L26 ]
  %12 = add i64 %value_phi3, -1
  %13 = getelementptr inbounds i32, i32 addrspace(13)* %10, i64 %12
  %14 = load i32, i32 addrspace(13)* %13, align 4, !tbaa !13
  %15 = add i64 %12, %11
  %16 = getelementptr inbounds i32, i32 addrspace(13)* %10, i64 %15
  store i32 %14, i32 addrspace(13)* %16, align 4, !tbaa !13
  %17 = icmp eq i64 %value_phi3, %4
  br i1 %17, label %L45, label %L26

L45:                                              ; preds = %L26
  ret void
}

attributes #0 = { "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{}
!2 = !{i64 16}
!3 = !{i64 8}
!4 = !{!5, !5, i64 0}
!5 = !{!"jtbaa_mutab", !6, i64 0}
!6 = !{!"jtbaa_value", !7, i64 0}
!7 = !{!"jtbaa_data", !8, i64 0}
!8 = !{!"jtbaa"}
!9 = !{i64 40}
!10 = !{!11, !11, i64 0}
!11 = !{!"jtbaa_arrayptr", !12, i64 0}
!12 = !{!"jtbaa_array", !8, i64 0}
!13 = !{!14, !14, i64 0}
!14 = !{!"jtbaa_arraybuf", !7, i64 0}

@Keno
Copy link
Member Author

Keno commented Jul 24, 2018

Filed upstream as https://bugs.llvm.org/show_bug.cgi?id=38290

@Keno
Copy link
Member Author

Keno commented Jul 24, 2018

Candidate patch:

diff --git a/lib/Analysis/ScalarEvolutionExpander.cpp b/lib/Analysis/ScalarEvolutionExpander.cpp
index 7f76f057216..3fd1f19e70b 100644
--- a/lib/Analysis/ScalarEvolutionExpander.cpp
+++ b/lib/Analysis/ScalarEvolutionExpander.cpp
@@ -2157,8 +2157,9 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,
   const SCEV *Step = AR->getStepRecurrence(SE);
   const SCEV *Start = AR->getStart();

+  Type *ARType = AR->getType();
   unsigned SrcBits = SE.getTypeSizeInBits(ExitCount->getType());
-  unsigned DstBits = SE.getTypeSizeInBits(AR->getType());
+  unsigned DstBits = SE.getTypeSizeInBits(ARType);

   // The expression {Start,+,Step} has nusw/nssw if
   //   Step < 0, Start - |Step| * Backedge <= Start
@@ -2174,7 +2175,6 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,

   Value *StepValue = expandCodeFor(Step, Ty, Loc);
   Value *NegStepValue = expandCodeFor(SE.getNegativeSCEV(Step), Ty, Loc);
-  Value *StartValue = expandCodeFor(Start, Ty, Loc);

   ConstantInt *Zero =
       ConstantInt::get(Loc->getContext(), APInt::getNullValue(DstBits));
@@ -2197,8 +2197,23 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,
   // Compute:
   //   Start + |Step| * Backedge < Start
   //   Start - |Step| * Backedge > Start
-  Value *Add = Builder.CreateAdd(StartValue, MulV);
-  Value *Sub = Builder.CreateSub(StartValue, MulV);
+
+  Value *StartValue, *Add, *Sub;
+  // Expand as either integer arithmetic or GEP, depending
+  // on whether we're allowed to transition to the integer
+  // domain
+  if (DL.isNonIntegralPointerType(ARType)) {
+    PointerType *ARPtrTy = cast<PointerType>(ARType);
+    StartValue = expandCodeFor(Start, ARType, Loc);
+    const SCEV *MulS = SE.getSCEV(MulV);
+    const SCEV *const StepArray[2] = { MulS, SE.getNegativeSCEV(MulS) };
+    Add = expandAddToGEP(&StepArray[0], &StepArray[1], ARPtrTy, Ty, StartValue);
+    Sub = expandAddToGEP(&StepArray[1], &StepArray[2], ARPtrTy, Ty, StartValue);
+  } else {
+    StartValue = expandCodeFor(Start, Ty, Loc);
+    Add = Builder.CreateAdd(StartValue, MulV);
+    Sub = Builder.CreateSub(StartValue, MulV);
+  }

   Value *EndCompareGT = Builder.CreateICmp(
       Signed ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_UGT, Sub, StartValue);

@Keno
Copy link
Member Author

Keno commented Jul 25, 2018

Looks like there's another issue in loop-reduce. Bugpoint reduction:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-b903c88.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

define void @"japi1_square!_7021"() #0 {
top:
  br label %L28

L28:                                              ; preds = %L28, %top
  %value_phi3 = phi i64 [ 1, %top ], [ %5, %L28 ]
  %0 = add i64 %value_phi3, -1
  %1 = getelementptr inbounds i32, i32 addrspace(13)* undef, i64 %0
  %2 = load i32, i32 addrspace(13)* %1, align 4, !tbaa !1
  %3 = add i64 %0, undef
  %4 = getelementptr inbounds i32, i32 addrspace(13)* undef, i64 %3
  store i32 %2, i32 addrspace(13)* %4, align 4, !tbaa !1
  %5 = add i64 %value_phi3, 1
  br label %L28
}

attributes #0 = { "target-cpu"="skylake" "target-features"="+xsaves,+xsavec,+xsaveopt,+prfchw,+lzcnt,+sahf,+pku,+avx512vl,+avx512bw,+avx512cd,+clwb,+clflushopt,+adx,+rdseed,+avx512dq,+avx512f,+mpx,+rtm,+bmi2,+avx2,+bmi,+fsgsbase,+rdrnd,+f16c,+avx,+xsave,+aes,+popcnt,+movbe,+sse4.2,+sse4.1,+cx16,+fma,+ssse3,+pclmul,+sse3,-avx512ifma,-avx512pf,-avx512er,-sha,-prefetchwt1,-avx512vbmi,-avx512vpopcntdq,-sse4a,-xop,-lwp,-fma4,-tbm,-mwaitx,-clzero,+sse2,+mmx,+fxsr" "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{!2, !2, i64 0}
!2 = !{!"jtbaa_arraybuf", !3, i64 0}
!3 = !{!"jtbaa_data", !4, i64 0}
!4 = !{!"jtbaa"}

@Keno
Copy link
Member Author

Keno commented Jul 25, 2018

Loop reduce bug looks fixed on upstream master. I'll bisect to find the relevant commit

@Keno
Copy link
Member Author

Keno commented Jul 26, 2018

The commit that fixes LSR is:

commit ab60b05a472e8651cbe53c19513b7e62b9ff32df
Author: Mikael Holmen <[email protected]>
Date:   Thu Feb 1 06:38:34 2018 +0000

    [LSR] Don't force bases of foldable formulae to the final type.

    Summary:
    Before emitting code for scaled registers, we prevent
    SCEVExpander from hoisting any scaled addressing mode
    by emitting all the bases first. However, these bases
    are being forced to the final type, resulting in some
    odd code.

    For example, if the type of the base is an integer and
    the final type is a pointer, we will emit an inttoptr
    for the base, a ptrtoint for the scale, and then a
    'reverse' GEP where the GEP pointer is actually the base
    integer and the index is the pointer. It's more intuitive
    to use the pointer as a pointer and the integer as index.

    Patch by: Bevin Hansson

    Reviewers: atrick, qcolombet, sanjoy

    Reviewed By: qcolombet

    Subscribers: llvm-commits

    Differential Revision: https://reviews.llvm.org/D42103

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323946 91177308-0d34-0410-b5e6-96231b3b80d8

@Keno
Copy link
Member Author

Keno commented Jul 26, 2018

Ok, everything builds and passes with those two patches. I'll go ahead and do those patches as a separate PR for ease of integration.

MDNode *tbaa = arraytype_constshape(tinfo.typ) ? tbaa_const : tbaa_arrayptr;
PointerType *PT = cast<PointerType>(addr->getType());
PointerType *PPT = cast<PointerType>(PT->getElementType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addr->getResultElementType()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the reference to PT anyway, so I think this is fine.

if (auto PtrT = dyn_cast<PointerType>(LI->getType())) {
if (PtrT->getAddressSpace() == AddressSpace::Loaded) {
CurrentV = LI->getPointerOperand();
if (getValueAddrSpace(CurrentV) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be if (!isSpecialPtr(getValueAddrSpace(CurrentV))), or is the idea that we only want to allow the base pointer to be either Tracked or None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have non-gc address spaces in a target that can do GC, so I don't think it matters too much, but I'll change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they support having code in the gc address space, as long as it eventually (after optimizations) doesn't end up in the final code. I think only maleadt would know if this makes a difference here, so I'm probably just being overly conservative.

@@ -211,6 +211,11 @@ three different address spaces (their numbers are defined in `src/codegen_shared
future), but unlike the other pointers need not be rooted if passed to a
call (they do still need to be rooted if they are live across another safepoint
between the definition and the call).
- Pointers loaded from tracked object (currently 13): This is used by arrays,
which themselves contain a data to the managed data. This data area is owned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a pointer to"

@Keno
Copy link
Member Author

Keno commented Jul 26, 2018

First patch is submitted upstream as https://reviews.llvm.org/D49832.

@Keno
Copy link
Member Author

Keno commented Jul 26, 2018

Committed upstream as https://reviews.llvm.org/rL338072 / https://reviews.llvm.org/rL338073, but we can keep the D49832 version. It's functionally equivalent.

The array data pointer is somewhat special. It points to a chunk
for memory that is effectively managed by the GC, but is not itself
a GC-tracked value. However, it is also not quite an interior pointer
into the array, since it may be an external allocation (or at the
more immediate IR level it is derived using a load rather than
a gep). We could try to make Derived do both, but the semantics
turn out to be rather different, so add a new kind of AS `Loaded`,
that handles precisely this situation: It roots the object that it
was loaded from while it is live.

Fixes #27955
@Keno Keno changed the title WIP: Add a special GC AS for array ptrs Add a special GC AS for array ptrs Jul 28, 2018
@Keno Keno merged commit 3cfc131 into master Jul 28, 2018
@martinholters martinholters deleted the kf/gcloaded branch July 30, 2018 06:44
KristofferC pushed a commit that referenced this pull request Feb 11, 2019
The array data pointer is somewhat special. It points to a chunk
for memory that is effectively managed by the GC, but is not itself
a GC-tracked value. However, it is also not quite an interior pointer
into the array, since it may be an external allocation (or at the
more immediate IR level it is derived using a load rather than
a gep). We could try to make Derived do both, but the semantics
turn out to be rather different, so add a new kind of AS `Loaded`,
that handles precisely this situation: It roots the object that it
was loaded from while it is live.

Fixes #27955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Corrupted memory?
3 participants