Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault/assertion failure/IR verification error due to non-dominated use of SSAValue #46408

Closed
maleadt opened this issue Aug 19, 2022 · 5 comments · Fixed by #46819
Closed
Labels
compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Aug 19, 2022

As seen on PkgEval:

[15] signal (11): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/Gnuplot/Iz8LH/test/runtests.jl:72
jl_iintrinsic_2 at /cache/build/default-amdci5-1/julialang/julia-master/src/runtime_intrinsics.c:985
<= at ./int.jl:488 [inlined]
in at ./range.jl:1413 [inlined]
map at ./tuple.jl:299 [inlined]
iterate at ./multidimensional.jl:396 [inlined]
arrays2datablock at /home/pkgeval/.julia/packages/Gnuplot/Iz8LH/src/Gnuplot.jl:430

Looking at the rr trace, turns out we're passing garbage into sle_int:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007fee329f6c71 in jl_iintrinsic_2 (a=0x7fee2c5f2060, b=0x3, name=0x7fee32b92e15 "sle_int", lambda2=0x7fee329f5860 <jl_intrinsiclambda_cmp>, list=0x7fee32c412a0 <sle_int_list>, cvtb=0, getsign=0x7fee329f5840 <signbitbyte>) at /cache/build/default-amdci5-1/julialang/julia-master/src/runtime_intrinsics.c:985
985	    jl_value_t *tyb = jl_typeof(b);
(rr) p b
$7 = (jl_value_t *) 0x3
L3208:                                            ; preds = %L3207
  %3973 = call {} addrspace(10)* @jl_sle_int({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140660923375712 to {}*) to {} addrspace(10)*), {} addrspace(10)* undef), !dbg !3060
  %3974 = bitcast {} addrspace(10)* %3973 to i8 addrspace(10)*, !dbg !3063
  %3975 = load i8, i8 addrspace(10)* %3974, align 1, !dbg !3063, !tbaa !3064, !range !3065
  %3976 = trunc i8 %3975 to i1, !dbg !3063
  %3977 = and i1 true, %3976, !dbg !3063
  %3978 = call {} addrspace(10)* @jl_sle_int({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140660923375712 to {}*) to {} addrspace(10)*), {} addrspace(10)* undef), !dbg !3060
  %3979 = bitcast {} addrspace(10)* %3978 to i8 addrspace(10)*, !dbg !3063
  %3980 = load i8, i8 addrspace(10)* %3979, align 1, !dbg !3063, !tbaa !3064, !range !3065
  %3981 = trunc i8 %3980 to i1, !dbg !3063
  %3982 = and i1 true, %3981, !dbg !3063
  %3983 = and i1 %3977, %3982, !dbg !3066
  %3984 = xor i1 %3983, true, !dbg !3059
  %3985 = zext i1 %3984 to i8, !dbg !3057
  %3986 = trunc i8 %3985 to i1, !dbg !3057
  %3987 = xor i1 %3986, true, !dbg !3057
  br i1 %3987, label %L3216, label %L3215, !dbg !3057

Note the undef arguments.

Running the above under an assertions build reveals a failed assertion that can be reduced to:

using InteractiveUtils

function e(a::Vararg)
    b = 0
    for c in a
        0 && (b = c)
        if for d in CartesianIndices(a[b]) end
            for d in CartesianIndices(size(b)) end
        end
    end
end

code_llvm(e, Tuple{Array{Int64, 3}, Array{Int64, 2}})
julia: /home/tim/Julia/src/julia/src/codegen.cpp:4573: void emit_ssaval_assign(jl_codectx_t&, ssize_t, jl_value_t*): Assertion `!ctx.ssavalue_assigned.at(ssaidx_0based)' failed.

Bisected to #44557, so cc @ianatol.

@maleadt maleadt added regression Regression in behavior compared to a previous version compiler:codegen Generation of LLVM IR and native code labels Aug 19, 2022
@maleadt
Copy link
Member Author

maleadt commented Aug 19, 2022

The assertion occurs during emission of %412:

  77 ── %411 = π (%14, Matrix{Int64})          │
  │     %412 = Base.arraysize(%411, 1)::Int64  │╻        size
  │     ...
  └────        goto #79

Turns out the destination SSAValue has been assigned already when a later block that refers to SSAValue(412) was emitted earlier in time:

  82 ── %426 = Base.slt_int(%412, 0)::Bool     ││╻╷╷╷╷╷   map
  │     ...
  └────        goto #86

This is because emit_intrinsic calls emit_expr on the SSAValue(412) argument, causing it to get assigned:

julia/src/codegen.cpp

Lines 4907 to 4913 in b9b60fc

if (jl_is_ssavalue(expr)) {
ssize_t idx = ((jl_ssavalue_t*)expr)->id - 1;
assert(idx >= 0);
if (!ctx.ssavalue_assigned.at(idx)) {
ctx.ssavalue_assigned.at(idx) = true; // (assignment, not comparison test)
return jl_cgval_t(); // dead code branch
}

And indeed, this is spotted by our IR verification pass, which complains on a debug build that:

Basic Block 77 does not dominate block 82 (tried to use value 412)

@maleadt maleadt changed the title Segfault due to invalid IR (reduced to failed assertion: !ctx.ssavalue_assigned) Segfault/assertion failure/IR verification error due to non-dominated use of SSAValue Aug 19, 2022
@maleadt
Copy link
Member Author

maleadt commented Aug 19, 2022

Some more digging (a good way to get familiar with this part of the compiler), here's the IR just before SROA:

  76 ── %334 = (isa)(%11, Matrix{Int64})::Bool
  └────        goto #78 if not %334
  77 ── %336 = π (%11, Matrix{Int64})
  │     %337 = Base.arraysize(%336, 1)::Int64
  │     %338 = Base.arraysize(%336, 2)::Int64
  │     %339 = Core.tuple(%337, %338)::Tuple{Int64, Int64}
  └────        goto #79
  78 ──        Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
  └────        unreachable
  79 ┄─ %343 = φ (#73 => (), #75 => %332, #77 => %339)::Union{Tuple{}, Tuple{Int64, Int64}, Tuple{Int64, Int64, Int64}}
  │     %344 = (isa)(%343, Tuple{})::Bool
  └────        goto #81 if not %344
  80 ──        goto #86
  81 ── %347 = (isa)(%343, Tuple{Int64, Int64})::Bool
  └────        goto #83 if not %347
  82 ── %349 = π (%343, Tuple{Int64, Int64})
  │     %350 = Base.getfield(%349, 1, true)::Int64
  │     %351 = Base.slt_int(%350, 0)::Bool

The tuple constructed from %337 and %338 is deconstructed and forwarded to the use in block 82:

  76 ── %399 = (isa)(%11, Matrix{Int64})::Bool
  └────        goto #78 if not %399
  77 ── %401 = π (%11, Matrix{Int64})
  │     %402 = Base.arraysize(%401, 1)::Int64
  │     %403 = Base.arraysize(%401, 2)::Int64
  │            nothing::Tuple{Int64, Int64}
  └────        goto #79
  78 ──        Core.throw(ErrorException("fatal error in type inference (type bound)"))::Union{}
  └────        unreachable
  79 ┄─ %735 = φ (#75 => %394)::Int64
  │     %737 = φ (#75 => %396)::Int64
  │     %736 = φ (#75 => %395)::Int64
  │     %733 = φ (#73 => false, #75 => false, #77 => true)::Bool
  │     %734 = φ (#73 => false, #75 => true, #77 => false)::Bool
  │     %732 = φ (#73 => true, #75 => false, #77 => false)::Bool
  │            nothing::Union{Tuple{}, Tuple{Int64, Int64}, Tuple{Int64, Int64, Int64}}
  │     %409 = %732::Bool
  └────        goto #81 if not %409
  80 ──        goto #86
  81 ── %412 = %733::Bool
  └────        goto #83 if not %412
  82 ──        nothing::Tuple{Int64, Int64}
  │     %415 = %402::Int64
  │     %416 = Base.slt_int(%415, 0)::Bool

That's invalid because BB 82 isn't dominated by BB 77 (as reported by the verifier right after SROA)

Basic Block 77 does not dominate block 82 (tried to use value 402)

@maleadt
Copy link
Member Author

maleadt commented Aug 19, 2022

Another breadcrumb: SROA checks whether the original scalar (e.g. %337) dominates every phi node that uses the aggregate. So here, %337 dominates %343, and the replacement happens. But %337 does not dominate BB 82 where the aggregate is finally used, and will be lifted to, and that is what trips up the verifier.

A simple check if the leaves dominate the target statement works:

diff --git a/base/compiler/ssair/passes.jl b/base/compiler/ssair/passes.jl
index 594b77b386..683927cca4 100644
--- a/base/compiler/ssair/passes.jl
+++ b/base/compiler/ssair/passes.jl
@@ -635,13 +635,30 @@ function perform_lifting!(compact::IncrementalCompact,
         dominates_all = true
         if lazydomtree !== nothing
             domtree = get(lazydomtree)
+            if !dominates_ssa(compact, domtree, the_leaf_val, stmt_val)
+                dominates_all = false
+            else
                 for item in visited_phinodes
                     if !dominates_ssa(compact, domtree, the_leaf_val, item)
                         dominates_all = false
                         break
                     end
                 end
+            end

... but I'm not sure this is the best fix. Shouldn't there be another phi if there another path into the statements' block? I also don't see any goto's in the full IR.

@vchuravy
Copy link
Member

I would have expected that %343 get's replace by a set of Phi node of the scalars. and then the use in %415 becomes a use of those new PhiNodes instead of a direct forward?

@ianatol
Copy link
Member

ianatol commented Aug 19, 2022

Thanks @maleadt, will check this out today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants