Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary GC root for select/phi #22421

Closed
yuyichao opened this issue Jun 18, 2017 · 3 comments
Closed

Unnecessary GC root for select/phi #22421

yuyichao opened this issue Jun 18, 2017 · 3 comments
Labels
compiler:codegen Generation of LLVM IR and native code

Comments

@yuyichao
Copy link
Contributor

Seems that global constant in select isn't handled very well. The following function shouldn't need a GC root since both input of a are globally rooted.

julia> const a1 = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> const a2 = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> function f(b)
           a = b ? a1 : a2
           sin(a[1])
           sin(a[1])
       end
f (generic function with 1 method)

LLVM IR.

define double @julia_f_63194(i8) #0 !dbg !5 {
top:
  %gcframe11 = alloca [3 x %jl_value_t addrspace(10)*], align 8
  %gcframe11.sub = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe11, i64 0, i64 0
  %1 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe11, i64 0, i64 1
  %2 = bitcast %jl_value_t addrspace(10)** %1 to i8*
  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 16, i32 8, i1 false)
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10928, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #4
  %3 = bitcast [3 x %jl_value_t addrspace(10)*]* %gcframe11 to i64*
  store i64 2, i64* %3, align 8
  %4 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe11, i64 0, i64 1
  %5 = bitcast i8* %ptls_i8 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = bitcast %jl_value_t addrspace(10)** %4 to i64*
  store i64 %6, i64* %7, align 8
  %8 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe11.sub, %jl_value_t addrspace(10)*** %8, align 8
  %9 = and i8 %0, 1
  %10 = icmp eq i8 %9, 0
  %. = select i1 %10, %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140705532235504 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140705531694720 to %jl_value_t*) to %jl_value_t addrspace(10)*)
  %11 = addrspacecast %jl_value_t addrspace(10)* %. to %jl_value_t addrspace(11)*
  %12 = bitcast %jl_value_t addrspace(11)* %11 to %jl_value_t addrspace(10)* addrspace(11)*
  %13 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %12, i64 3
  %14 = bitcast %jl_value_t addrspace(10)* addrspace(11)* %13 to i64 addrspace(11)*
  %15 = load i64, i64 addrspace(11)* %14, align 8
  %16 = icmp eq i64 %15, 0
  br i1 %16, label %oob, label %idxend

oob:                                              ; preds = %top
  %17 = alloca i64, align 8
  store i64 1, i64* %17, align 8
  %18 = addrspacecast %jl_value_t addrspace(10)* %. to %jl_value_t addrspace(12)*
  %19 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe11, i64 0, i64 2
  store %jl_value_t addrspace(10)* %., %jl_value_t addrspace(10)** %19, align 8
  call void @jl_bounds_error_ints(%jl_value_t addrspace(12)* %18, i64* nonnull %17, i64 1)
  unreachable

idxend:                                           ; preds = %top
  %20 = bitcast %jl_value_t addrspace(11)* %11 to i64* addrspace(11)*
  %21 = load i64*, i64* addrspace(11)* %20, align 8
  %22 = load i64, i64* %21, align 8
  %23 = sitofp i64 %22 to double
  %24 = getelementptr inbounds [3 x %jl_value_t addrspace(10)*], [3 x %jl_value_t addrspace(10)*]* %gcframe11, i64 0, i64 2
  store %jl_value_t addrspace(10)* %., %jl_value_t addrspace(10)** %24, align 8
  %25 = call double inttoptr (i64 140705525129632 to double (double)*)(double %23)
  %26 = fcmp ord double %25, 0.000000e+00
  br i1 %26, label %L21, label %if4

if4:                                              ; preds = %idxend
  call void @jl_throw(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 140706069952400 to %jl_value_t*) to %jl_value_t addrspace(12)*))
  unreachable

L21:                                              ; preds = %idxend
  %27 = load i64, i64 addrspace(11)* %14, align 8
  %28 = icmp eq i64 %27, 0
  br i1 %28, label %oob6, label %idxend7

oob6:                                             ; preds = %L21
  %29 = alloca i64, align 8
  store i64 1, i64* %29, align 8
  %30 = addrspacecast %jl_value_t addrspace(10)* %. to %jl_value_t addrspace(12)*
  call void @jl_bounds_error_ints(%jl_value_t addrspace(12)* %30, i64* nonnull %29, i64 1)
  unreachable

idxend7:                                          ; preds = %L21
  %31 = load i64*, i64* addrspace(11)* %20, align 8
  %32 = load i64, i64* %31, align 8
  %33 = sitofp i64 %32 to double
  %34 = call double inttoptr (i64 140705525129632 to double (double)*)(double %33)
  %35 = fcmp ord double %34, 0.000000e+00
  br i1 %35, label %L37, label %if8

if8:                                              ; preds = %idxend7
  call void @jl_throw(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 140706069952400 to %jl_value_t*) to %jl_value_t addrspace(12)*))
  unreachable

L37:                                              ; preds = %idxend7
  %36 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %4, align 8
  %37 = bitcast i8* %ptls_i8 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %36, %jl_value_t addrspace(10)** %37, align 8
  ret double %34
}
@yuyichao yuyichao added the compiler:codegen Generation of LLVM IR and native code label Jun 18, 2017
@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 5, 2017

Another similar case:

The following code allocates a GC root slots for c ? a : b. However, none is actually needed since it is always one of the arguments.

julia> @noinline f(a, b, c) = (a, b, c)
f (generic function with 1 method)

julia> function g(a, b, c)
           return f(a, b, c ? a : b)
       end
g (generic function with 1 method)

julia> @code_llvm g(Ref(1), Ref(2), true)
; Function g
; Location: REPL[2]
define %jl_value_t addrspace(10)* @julia_g_64146(%jl_value_t addrspace(10)* dereferenceable(8), %jl_value_t addrspace(10)* dereferenceable(8), i8) {
top:
  %3 = alloca %jl_value_t addrspace(10)*, i32 3
  %gcframe = alloca %jl_value_t addrspace(10)*, i32 3
  %4 = bitcast %jl_value_t addrspace(10)** %gcframe to i8*
  call void @llvm.memset.p0i8.i32(i8* %4, i8 0, i32 24, i32 0, i1 false)
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10920, $0", "=r,~{dirflag},~{fpsr},~{flags}"()
  %ptls = bitcast i8* %ptls_i8 to %jl_value_t***
; Location: REPL[2]:2
  %5 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 0
  %6 = bitcast %jl_value_t addrspace(10)** %5 to i64*
  store i64 2, i64* %6
  %7 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %8 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %9 = bitcast %jl_value_t addrspace(10)** %8 to %jl_value_t***
  %10 = load %jl_value_t**, %jl_value_t*** %7
  store %jl_value_t** %10, %jl_value_t*** %9
  %11 = bitcast %jl_value_t*** %7 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe, %jl_value_t addrspace(10)*** %11
  %12 = and i8 %2, 1
  %13 = icmp eq i8 %12, 0
  %. = select i1 %13, %jl_value_t addrspace(10)* %1, %jl_value_t addrspace(10)* %0
  %14 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 2
  store %jl_value_t addrspace(10)* %., %jl_value_t addrspace(10)** %14
  %15 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 0
  store %jl_value_t addrspace(10)* %0, %jl_value_t addrspace(10)** %15
  %16 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 1
  store %jl_value_t addrspace(10)* %1, %jl_value_t addrspace(10)** %16
  %17 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %3, i32 2
  store %jl_value_t addrspace(10)* %., %jl_value_t addrspace(10)** %17
  %18 = call %jl_value_t addrspace(10)* @japi1_f_64146(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140585273278480 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %3, i32 3)
  %19 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %20 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %19
  %21 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %22 = bitcast %jl_value_t*** %21 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %20, %jl_value_t addrspace(10)** %22
  ret %jl_value_t addrspace(10)* %18
}

@vtjnash
Copy link
Member

vtjnash commented Oct 5, 2017

Seems like the argument input addrspace may be wrong?

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 5, 2017

No those are fine (this also happens for locally allocated objects) The issue is that the pass treats select (and phi?) insts as new values that are independent with the value they are generated from.

yuyichao added a commit that referenced this issue Oct 7, 2017
If both inputs are rooted then we don't need to root the result.
Rename `LoadRefinements` to `Refinements` since it's not load specific anymore.

Fix #22421
@yuyichao yuyichao changed the title Unnecessary GC root for select constants Unnecessary GC root for select/phi Oct 9, 2017
yuyichao added a commit that referenced this issue Oct 9, 2017
If both inputs are rooted then we don't need to root the result.
Rename `LoadRefinements` to `Refinements` since it's not load specific anymore.

Fix #22421
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

No branches or pull requests

2 participants