Add a special GC AS for array ptrs #28251

Keno · 2018-07-24T01:41:41Z

The array data pointer is somewhat special. It points to a chunk
for memory that is effectively managed by the GC, but is not itself
a GC-tracked value. However, it is also not quite an interior pointer
into the array, since it may be an external allocation (or at the
more immediate IR level it is derived using a load rather than
a gep). We could try to make Derived do both, but the semantics
turn out to be rather different, so add a new kind of AS Loaded,
that handles precisely this situation: It roots the object that it
was loaded from while it is live.

Fixes #27955

Current status: Seems to fix the issue, but seems to trigger LLVM bugs (optimizations introducing illegal ptrtoint).

Keno · 2018-07-24T20:53:04Z

Bugpointed test case for the LLVM bug:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-878a985.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

%jl_value_t = type opaque
%jl_array_t = type { i8 addrspace(13)*, i64, i16, i16, i32 }

declare i64 @julia_steprange_last_4949()

define void @"japi1_align!_9477"(%jl_value_t addrspace(10)**) #0 {
top:
  %1 = getelementptr inbounds %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %0, i64 0
  %2 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, !nonnull !1, !dereferenceable !2, !align !3
  %3 = getelementptr inbounds i8, i8 addrspace(11)* null, i64 12
  %4 = bitcast i8 addrspace(11)* %3 to i32 addrspace(11)*
  %5 = load i32, i32 addrspace(11)* %4, align 4, !tbaa !4
  %6 = sub i32 0, %5
  %7 = call i64 @julia_steprange_last_4949()
  br label %L26

L26:                                              ; preds = %L26, %top
  %value_phi3 = phi i64 [ 0, %top ], [ %spec.select, %L26 ]
  %8 = addrspacecast %jl_value_t addrspace(10)* %2 to %jl_value_t addrspace(11)*
  %9 = bitcast %jl_value_t addrspace(11)* %8 to %jl_value_t addrspace(10)* addrspace(11)*
  %10 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %9, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %11 = sub i64 %value_phi3, 1
  %12 = mul i64 %11, 1
  %13 = add i64 0, %12
  %14 = addrspacecast %jl_value_t addrspace(10)* %10 to %jl_value_t addrspace(11)*
  %15 = bitcast %jl_value_t addrspace(11)* %14 to %jl_array_t addrspace(11)*
  %16 = getelementptr inbounds %jl_array_t, %jl_array_t addrspace(11)* %15, i32 0, i32 0
  %17 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %16, !tbaa !10, !nonnull !1
  %18 = bitcast i8 addrspace(13)* %17 to i32 addrspace(13)*
  %19 = getelementptr inbounds i32, i32 addrspace(13)* %18, i64 %13
  %20 = load i32, i32 addrspace(13)* %19, align 4, !tbaa !13
  %21 = addrspacecast %jl_value_t addrspace(10)* %2 to %jl_value_t addrspace(11)*
  %22 = bitcast %jl_value_t addrspace(11)* %21 to %jl_value_t addrspace(10)* addrspace(11)*
  %23 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %22, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %24 = sext i32 %6 to i64
  %25 = add i64 %value_phi3, %24
  %26 = sub i64 %25, 1
  %27 = mul i64 %26, 1
  %28 = add i64 0, %27
  %29 = addrspacecast %jl_value_t addrspace(10)* %23 to %jl_value_t addrspace(11)*
  %30 = bitcast %jl_value_t addrspace(11)* %29 to %jl_array_t addrspace(11)*
  %31 = getelementptr inbounds %jl_array_t, %jl_array_t addrspace(11)* %30, i32 0, i32 0
  %32 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %31, !tbaa !10, !nonnull !1
  %33 = bitcast i8 addrspace(13)* %32 to i32 addrspace(13)*
  %34 = getelementptr inbounds i32, i32 addrspace(13)* %33, i64 %28
  store i32 %20, i32 addrspace(13)* %34, align 4, !tbaa !13
  %35 = icmp eq i64 %value_phi3, %7
  %36 = zext i1 %35 to i8
  %37 = trunc i8 %36 to i1
  %38 = xor i1 %37, true
  %39 = add i64 %value_phi3, -1
  %spec.select = select i1 %38, i64 %39, i64 undef
  %spec.select1 = select i1 %38, i8 0, i8 1
  %40 = xor i8 %spec.select1, 1
  %41 = trunc i8 %40 to i1
  %42 = xor i1 %41, true
  br i1 %42, label %L45, label %L26

L45:                                              ; preds = %L26
  ret void
}

attributes #0 = { "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{}
!2 = !{i64 16}
!3 = !{i64 8}
!4 = !{!5, !5, i64 0}
!5 = !{!"jtbaa_mutab", !6, i64 0}
!6 = !{!"jtbaa_value", !7, i64 0}
!7 = !{!"jtbaa_data", !8, i64 0}
!8 = !{!"jtbaa"}
!9 = !{i64 40}
!10 = !{!11, !11, i64 0}
!11 = !{!"jtbaa_arrayptr", !12, i64 0}
!12 = !{!"jtbaa_array", !8, i64 0}
!13 = !{!14, !14, i64 0}
!14 = !{!"jtbaa_arraybuf", !7, i64 0}

Keno · 2018-07-24T20:55:20Z

Introduced by the vectorizer:

vector.scevcheck:                                 ; preds = %top
  %14 = sub i64 0, %4
  %15 = add i64 %12, -1
  %scevgep = getelementptr i32, i32 addrspace(13)* %10, i64 %15
  %scevgep2 = ptrtoint i32 addrspace(13)* %scevgep to i64
  %mul = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 %14)
  %mul.result = extractvalue { i64, i1 } %mul, 0
  %mul.overflow = extractvalue { i64, i1 } %mul, 1
  %16 = add i64 %scevgep2, %mul.result
  %17 = sub i64 %scevgep2, %mul.result
  %18 = icmp ugt i64 %17, %scevgep2
  %19 = icmp ult i64 %16, %scevgep2
  %20 = select i1 true, i1 %18, i1 %19
  %21 = or i1 %20, %mul.overflow
  %22 = or i1 false, %21
  %scevgep3 = getelementptr i32, i32 addrspace(13)* %10, i64 -1
  %scevgep34 = ptrtoint i32 addrspace(13)* %scevgep3 to i64
  %mul5 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 %14)
  %mul.result6 = extractvalue { i64, i1 } %mul5, 0
  %mul.overflow7 = extractvalue { i64, i1 } %mul5, 1
  %23 = add i64 %scevgep34, %mul.result6
  %24 = sub i64 %scevgep34, %mul.result6
  %25 = icmp ugt i64 %24, %scevgep34
  %26 = icmp ult i64 %23, %scevgep34
  %27 = select i1 true, i1 %25, i1 %26
  %28 = or i1 %27, %mul.overflow7
  %29 = or i1 %22, %28
  br i1 %29, label %scalar.ph, label %vector.memcheck

Keno · 2018-07-24T21:03:20Z

loop-vectorize only test case:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-878a985.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

%jl_value_t = type opaque
%jl_array_t = type { i8 addrspace(13)*, i64, i16, i16, i32 }

declare i64 @julia_steprange_last_4949()

define void @"japi1_align!_9477"(%jl_value_t addrspace(10)**) #0 {
top:
  %1 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %0, align 8, !nonnull !1, !dereferenceable !2, !align !3
  %2 = load i32, i32* inttoptr (i64 12 to i32*), align 4, !tbaa !4
  %3 = sub i32 0, %2
  %4 = call i64 @julia_steprange_last_4949()
  %5 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to %jl_value_t addrspace(10)* addrspace(11)*
  %7 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %6, align 8, !tbaa !4, !nonnull !1, !dereferenceable !9, !align !2
  %8 = addrspacecast %jl_value_t addrspace(10)* %7 to %jl_value_t addrspace(11)*
  %9 = bitcast %jl_value_t addrspace(11)* %8 to i32 addrspace(13)* addrspace(11)*
  %10 = load i32 addrspace(13)*, i32 addrspace(13)* addrspace(11)* %9, align 8, !tbaa !10, !nonnull !1
  %11 = sext i32 %3 to i64
  br label %L26

L26:                                              ; preds = %L26, %top
  %value_phi3 = phi i64 [ 0, %top ], [ %12, %L26 ]
  %12 = add i64 %value_phi3, -1
  %13 = getelementptr inbounds i32, i32 addrspace(13)* %10, i64 %12
  %14 = load i32, i32 addrspace(13)* %13, align 4, !tbaa !13
  %15 = add i64 %12, %11
  %16 = getelementptr inbounds i32, i32 addrspace(13)* %10, i64 %15
  store i32 %14, i32 addrspace(13)* %16, align 4, !tbaa !13
  %17 = icmp eq i64 %value_phi3, %4
  br i1 %17, label %L45, label %L26

L45:                                              ; preds = %L26
  ret void
}

attributes #0 = { "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{}
!2 = !{i64 16}
!3 = !{i64 8}
!4 = !{!5, !5, i64 0}
!5 = !{!"jtbaa_mutab", !6, i64 0}
!6 = !{!"jtbaa_value", !7, i64 0}
!7 = !{!"jtbaa_data", !8, i64 0}
!8 = !{!"jtbaa"}
!9 = !{i64 40}
!10 = !{!11, !11, i64 0}
!11 = !{!"jtbaa_arrayptr", !12, i64 0}
!12 = !{!"jtbaa_array", !8, i64 0}
!13 = !{!14, !14, i64 0}
!14 = !{!"jtbaa_arraybuf", !7, i64 0}

Keno · 2018-07-24T21:07:12Z

Filed upstream as https://bugs.llvm.org/show_bug.cgi?id=38290

Keno · 2018-07-24T21:55:41Z

Candidate patch:

diff --git a/lib/Analysis/ScalarEvolutionExpander.cpp b/lib/Analysis/ScalarEvolutionExpander.cpp
index 7f76f057216..3fd1f19e70b 100644
--- a/lib/Analysis/ScalarEvolutionExpander.cpp
+++ b/lib/Analysis/ScalarEvolutionExpander.cpp
@@ -2157,8 +2157,9 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,
   const SCEV *Step = AR->getStepRecurrence(SE);
   const SCEV *Start = AR->getStart();

+  Type *ARType = AR->getType();
   unsigned SrcBits = SE.getTypeSizeInBits(ExitCount->getType());
-  unsigned DstBits = SE.getTypeSizeInBits(AR->getType());
+  unsigned DstBits = SE.getTypeSizeInBits(ARType);

   // The expression {Start,+,Step} has nusw/nssw if
   //   Step < 0, Start - |Step| * Backedge <= Start
@@ -2174,7 +2175,6 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,

   Value *StepValue = expandCodeFor(Step, Ty, Loc);
   Value *NegStepValue = expandCodeFor(SE.getNegativeSCEV(Step), Ty, Loc);
-  Value *StartValue = expandCodeFor(Start, Ty, Loc);

   ConstantInt *Zero =
       ConstantInt::get(Loc->getContext(), APInt::getNullValue(DstBits));
@@ -2197,8 +2197,23 @@ Value *SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr *AR,
   // Compute:
   //   Start + |Step| * Backedge < Start
   //   Start - |Step| * Backedge > Start
-  Value *Add = Builder.CreateAdd(StartValue, MulV);
-  Value *Sub = Builder.CreateSub(StartValue, MulV);
+
+  Value *StartValue, *Add, *Sub;
+  // Expand as either integer arithmetic or GEP, depending
+  // on whether we're allowed to transition to the integer
+  // domain
+  if (DL.isNonIntegralPointerType(ARType)) {
+    PointerType *ARPtrTy = cast<PointerType>(ARType);
+    StartValue = expandCodeFor(Start, ARType, Loc);
+    const SCEV *MulS = SE.getSCEV(MulV);
+    const SCEV *const StepArray[2] = { MulS, SE.getNegativeSCEV(MulS) };
+    Add = expandAddToGEP(&StepArray[0], &StepArray[1], ARPtrTy, Ty, StartValue);
+    Sub = expandAddToGEP(&StepArray[1], &StepArray[2], ARPtrTy, Ty, StartValue);
+  } else {
+    StartValue = expandCodeFor(Start, Ty, Loc);
+    Add = Builder.CreateAdd(StartValue, MulV);
+    Sub = Builder.CreateSub(StartValue, MulV);
+  }

   Value *EndCompareGT = Builder.CreateICmp(
       Signed ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_UGT, Sub, StartValue);

Keno · 2018-07-25T16:23:17Z

Looks like there's another issue in loop-reduce. Bugpoint reduction:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-b903c88.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-unknown-linux-gnu"

define void @"japi1_square!_7021"() #0 {
top:
  br label %L28

L28:                                              ; preds = %L28, %top
  %value_phi3 = phi i64 [ 1, %top ], [ %5, %L28 ]
  %0 = add i64 %value_phi3, -1
  %1 = getelementptr inbounds i32, i32 addrspace(13)* undef, i64 %0
  %2 = load i32, i32 addrspace(13)* %1, align 4, !tbaa !1
  %3 = add i64 %0, undef
  %4 = getelementptr inbounds i32, i32 addrspace(13)* undef, i64 %3
  store i32 %2, i32 addrspace(13)* %4, align 4, !tbaa !1
  %5 = add i64 %value_phi3, 1
  br label %L28
}

attributes #0 = { "target-cpu"="skylake" "target-features"="+xsaves,+xsavec,+xsaveopt,+prfchw,+lzcnt,+sahf,+pku,+avx512vl,+avx512bw,+avx512cd,+clwb,+clflushopt,+adx,+rdseed,+avx512dq,+avx512f,+mpx,+rtm,+bmi2,+avx2,+bmi,+fsgsbase,+rdrnd,+f16c,+avx,+xsave,+aes,+popcnt,+movbe,+sse4.2,+sse4.1,+cx16,+fma,+ssse3,+pclmul,+sse3,-avx512ifma,-avx512pf,-avx512er,-sha,-prefetchwt1,-avx512vbmi,-avx512vpopcntdq,-sse4a,-xop,-lwp,-fma4,-tbm,-mwaitx,-clzero,+sse2,+mmx,+fxsr" "thunk" }

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = !{!2, !2, i64 0}
!2 = !{!"jtbaa_arraybuf", !3, i64 0}
!3 = !{!"jtbaa_data", !4, i64 0}
!4 = !{!"jtbaa"}

Keno · 2018-07-25T23:27:38Z

Loop reduce bug looks fixed on upstream master. I'll bisect to find the relevant commit

Keno · 2018-07-26T00:26:50Z

The commit that fixes LSR is:

commit ab60b05a472e8651cbe53c19513b7e62b9ff32df
Author: Mikael Holmen <[email protected]>
Date:   Thu Feb 1 06:38:34 2018 +0000

    [LSR] Don't force bases of foldable formulae to the final type.

    Summary:
    Before emitting code for scaled registers, we prevent
    SCEVExpander from hoisting any scaled addressing mode
    by emitting all the bases first. However, these bases
    are being forced to the final type, resulting in some
    odd code.

    For example, if the type of the base is an integer and
    the final type is a pointer, we will emit an inttoptr
    for the base, a ptrtoint for the scale, and then a
    'reverse' GEP where the GEP pointer is actually the base
    integer and the index is the pointer. It's more intuitive
    to use the pointer as a pointer and the integer as index.

    Patch by: Bevin Hansson

    Reviewers: atrick, qcolombet, sanjoy

    Reviewed By: qcolombet

    Subscribers: llvm-commits

    Differential Revision: https://reviews.llvm.org/D42103

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323946 91177308-0d34-0410-b5e6-96231b3b80d8

Keno · 2018-07-26T00:56:08Z

Ok, everything builds and passes with those two patches. I'll go ahead and do those patches as a separate PR for ease of integration.

vtjnash · 2018-07-26T01:05:03Z

src/cgutils.cpp

    MDNode *tbaa = arraytype_constshape(tinfo.typ) ? tbaa_const : tbaa_arrayptr;
+    PointerType *PT = cast<PointerType>(addr->getType());
+    PointerType *PPT = cast<PointerType>(PT->getElementType());


addr->getResultElementType()?

I need the reference to PT anyway, so I think this is fine.

vtjnash · 2018-07-26T01:18:22Z

src/llvm-late-gc-lowering.cpp

+            if (auto PtrT = dyn_cast<PointerType>(LI->getType())) {
+                if (PtrT->getAddressSpace() == AddressSpace::Loaded) {
+                    CurrentV = LI->getPointerOperand();
+                    if (getValueAddrSpace(CurrentV) == 0) {


should this be if (!isSpecialPtr(getValueAddrSpace(CurrentV))), or is the idea that we only want to allow the base pointer to be either Tracked or None

We don't really have non-gc address spaces in a target that can do GC, so I don't think it matters too much, but I'll change it.

I think they support having code in the gc address space, as long as it eventually (after optimizations) doesn't end up in the final code. I think only maleadt would know if this makes a difference here, so I'm probably just being overly conservative.

vtjnash · 2018-07-26T01:22:56Z

doc/src/devdocs/llvm.md

@@ -211,6 +211,11 @@ three different address spaces (their numbers are defined in `src/codegen_shared
  future), but unlike the other pointers need not be rooted if passed to a
  call (they do still need to be rooted if they are live across another safepoint
  between the definition and the call).
+- Pointers loaded from tracked object (currently 13): This is used by arrays,
+  which themselves contain a data to the managed data. This data area is owned


"a pointer to"

Keno · 2018-07-26T02:21:54Z

First patch is submitted upstream as https://reviews.llvm.org/D49832.

Keno · 2018-07-26T21:57:41Z

Committed upstream as https://reviews.llvm.org/rL338072 / https://reviews.llvm.org/rL338073, but we can keep the D49832 version. It's functionally equivalent.

The array data pointer is somewhat special. It points to a chunk for memory that is effectively managed by the GC, but is not itself a GC-tracked value. However, it is also not quite an interior pointer into the array, since it may be an external allocation (or at the more immediate IR level it is derived using a load rather than a gep). We could try to make Derived do both, but the semantics turn out to be rather different, so add a new kind of AS `Loaded`, that handles precisely this situation: It roots the object that it was loaded from while it is live. Fixes #27955

ararslan added GC Garbage collector bugfix This change fixes an existing bug labels Jul 24, 2018

Keno mentioned this pull request Jul 25, 2018

enable abstract interpreter to constant prop even if method source is generated #28282

Closed

vtjnash approved these changes Jul 26, 2018

View reviewed changes

vtjnash reviewed Jul 26, 2018

View reviewed changes

Keno force-pushed the kf/gcloaded branch from 0ee8ce9 to 4d3a223 Compare July 28, 2018 19:19

Keno changed the title ~~WIP: Add a special GC AS for array ptrs~~ Add a special GC AS for array ptrs Jul 28, 2018

Keno merged commit 3cfc131 into master Jul 28, 2018

martinholters deleted the kf/gcloaded branch July 30, 2018 06:44

abelsiqueira mentioned this pull request Sep 16, 2018

Bump julia min JuliaSmoothOptimizers/AmplNLReader.jl#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a special GC AS for array ptrs #28251

Add a special GC AS for array ptrs #28251

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 25, 2018

Keno commented Jul 25, 2018

Keno commented Jul 26, 2018

Keno commented Jul 26, 2018

vtjnash Jul 26, 2018

Keno Jul 28, 2018

vtjnash Jul 26, 2018

Keno Jul 28, 2018

vtjnash Jul 28, 2018

vtjnash Jul 26, 2018

Keno commented Jul 26, 2018 •

edited

Loading

Keno commented Jul 26, 2018 •

edited

Loading

Add a special GC AS for array ptrs #28251

Add a special GC AS for array ptrs #28251

Conversation

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 24, 2018

Keno commented Jul 25, 2018

Keno commented Jul 25, 2018

Keno commented Jul 26, 2018

Keno commented Jul 26, 2018

vtjnash Jul 26, 2018

Choose a reason for hiding this comment

Keno Jul 28, 2018

Choose a reason for hiding this comment

vtjnash Jul 26, 2018

Choose a reason for hiding this comment

Keno Jul 28, 2018

Choose a reason for hiding this comment

vtjnash Jul 28, 2018

Choose a reason for hiding this comment

vtjnash Jul 26, 2018

Choose a reason for hiding this comment

Keno commented Jul 26, 2018 • edited Loading

Keno commented Jul 26, 2018 • edited Loading

Keno commented Jul 26, 2018 •

edited

Loading

Keno commented Jul 26, 2018 •

edited

Loading