Segmentation faults with NTuple{2,NTuple{N,Core.VecElement{T}}}, but not NTuple{2N,Core.VecElement{T}} #30426

chriselrod · 2018-12-17T23:50:50Z

when N * sizeof(T) == 64.

I am showing results on a (4-day-old) master below, but I see the same behaviour on Julia 1.0.3:

julia> versioninfo()
Julia Version 1.2.0-DEV.12
Commit 77a7d92e91 (2018-12-13 21:20 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libimf
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

julia> bigvec() = ntuple(i -> Core.VecElement(1.0), Val(16))
bigvec (generic function with 1 method)

julia> twovecs() = (ntuple(i -> Core.VecElement(1.0), Val(8)),ntuple(i -> Core.VecElement(1.0), Val(8)))
twovecs (generic function with 1 method)

julia> bigvec()
(VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0))

julia> twovecs()
((VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0)), (VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0), VecElement{Float64}(1.0)))

julia> bigvec() |> typeof |> Base.datatype_alignment
16

julia> twovecs() |> typeof |> Base.datatype_alignment
16

julia> bigvec() |> typeof |> sizeof
128

julia> twovecs() |> typeof |> sizeof
128

julia> using BenchmarkTools

julia> @benchmark bigvec()
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4607182418800017408
  --------------
  minimum time:     0.015 ns (0.00% GC)
  median time:      0.017 ns (0.00% GC)
  mean time:        0.018 ns (0.00% GC)
  maximum time:     2.991 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark twovecs()

signal (11): Segmentation fault
in expression starting at no file:0
#_run#18 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:336
unknown function (ip: 0x7f3f05708efb)
jl_fptr_trampoline at /home/chriselrod/Documents/languages/jdev/src/gf.c:1854
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
inner at ./none:0
jl_fptr_trampoline at /home/chriselrod/Documents/languages/jdev/src/gf.c:1854
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
jl_apply at /home/chriselrod/Documents/languages/jdev/src/julia.h:1571 [inlined]
jl_f__apply at /home/chriselrod/Documents/languages/jdev/src/builtins.c:556
jl_f__apply_latest at /home/chriselrod/Documents/languages/jdev/src/builtins.c:594
#invokelatest#1 at ./essentials.jl:746 [inlined]
#invokelatest at ./none:0 [inlined]
#run_result#16 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:32 [inlined]
#run_result at ./none:0 [inlined]
#run#18 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:46
jl_fptr_trampoline at /home/chriselrod/Documents/languages/jdev/src/gf.c:1854
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
#run at ./none:0 [inlined]
#run at ./none:0 [inlined]
#warmup#21 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:79 [inlined]
warmup at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:79
jl_fptr_trampoline at /home/chriselrod/Documents/languages/jdev/src/gf.c:1854
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
do_call at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:323
eval_value at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:411
eval_stmt_value at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:362 [inlined]
eval_body at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:759
jl_interpret_toplevel_thunk_callback at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:885
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f3f162af98f)
unknown function (ip: 0x7)
jl_interpret_toplevel_thunk at /home/chriselrod/Documents/languages/jdev/src/interpreter.c:894
jl_toplevel_eval_flex at /home/chriselrod/Documents/languages/jdev/src/toplevel.c:764
jl_toplevel_eval_in at /home/chriselrod/Documents/languages/jdev/src/toplevel.c:793
eval at ./boot.jl:328
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
eval_user_input at /home/chriselrod/Documents/languages/jdev/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:85
run_backend at /home/chriselrod/.julia/packages/Revise/gStbk/src/Revise.jl:771
#58 at ./task.jl:259
jl_fptr_trampoline at /home/chriselrod/Documents/languages/jdev/src/gf.c:1854
jl_apply_generic at /home/chriselrod/Documents/languages/jdev/src/gf.c:2209
jl_apply at /home/chriselrod/Documents/languages/jdev/src/julia.h:1571 [inlined]
start_task at /home/chriselrod/Documents/languages/jdev/src/task.c:572
unknown function (ip: 0xffffffffffffffff)
Allocations: 25427851 (Pool: 25422691; Big: 5160); GC: 64
Segmentation fault (core dumped)

When I use a tuple of 2 8-length (64-total-byte) vectors I get a segmentation fault.

When I use a tuple of 2 4-length (32-total-byte) vectors I get no segmentation fault.

It is the number of bytes that matters. When using Float32, I can reproduce with no-segfault (but incorrectly reported allocations) NTuple{32,...}, but segfault on NTuple{2,NTuple{16,...}}.

I can reproduce the segmentation faults on both Ryzen and Skylake-X. I can try Haswell later.

Although this is obviously of more interest on Skylake-X (and other avx-512) architectures: because tuples of 32-byte vectors don't cause segfaults, I'd just not construct these tuples.

The workaround -- concatenating and then sub-setting larger vectors -- is a little awkward.

The text was updated successfully, but these errors were encountered:

vtjnash · 2018-12-18T19:29:54Z

Sounds similar to #21959, although I thought were were handling that at the codegen level in most cases now.

chriselrod · 2019-06-25T20:10:40Z

@vtjnash , could it at all be related to this:

julia> @noinline function foo(a)
           v = ntuple(Val(8)) do w Core.VecElement(Float64(w)) end
           a, (v, (a,(1<<10,1<<20)))
       end
foo (generic function with 1 method)

julia> foo(1)
(4607182418800017408, ((VecElement{Float64}(3.0), VecElement{Float64}(4.0), VecElement{Float64}(5.0), VecElement{Float64}(6.0), VecElement{Float64}(7.0), VecElement{Float64}(8.0), VecElement{Float64}(5.0e-324), VecElement{Float64}(5.06e-321)), (1048576, (14690820581056367, 140697086305440))))

julia> reinterpret(Float64, 4607182418800017408)
1.0

julia> reinterpret.(Float64,(1,1<<10))
(5.0e-324, 5.06e-321)

julia> 1<<20
1048576

?
If not, I'll file a new issue.

The memory layout Julia uses for constructing the tuple looks different from what what it uses to deconstruct it.

The first 1::Int disapears, and is replaced with a 1::Float64, the first element of the LLVM vector.
That vector is shifted by 16 bytes, so that its last two elements are instead from the following tuple.

The last two elements of the tuple are filled with junk.

Any idea about the cause or fix?
Right now I need to @inline more code than I'd like, which lets the compiler avoid constructing the tuples, leaving the memory uncorrupted.

vtjnash · 2019-06-25T21:07:17Z

Seems like a new issue: it looks like we're specifying the wrong sort of vector to LLVM for our desired alignment for Julia.

KristofferC · 2020-06-23T13:54:20Z

Code in first post seems to work now. Can this be closed? @chriselrod

chriselrod · 2020-06-23T14:01:50Z

Yes.

chriselrod mentioned this issue Jun 25, 2019

Wrong GEP emission for tuple containing VecElement #32414

Closed

chriselrod closed this as completed Jun 23, 2020

chriselrod mentioned this issue Nov 13, 2021

JULIA_LLVM_ARGS="--aarch64-sve-vector-bits-min=" LLVM SegFault creating ntuple of VecElement of passed vector length #43069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation faults with NTuple{2,NTuple{N,Core.VecElement{T}}}, but not NTuple{2N,Core.VecElement{T}} #30426

Segmentation faults with NTuple{2,NTuple{N,Core.VecElement{T}}}, but not NTuple{2N,Core.VecElement{T}} #30426

chriselrod commented Dec 17, 2018 •

edited

Loading

vtjnash commented Dec 18, 2018

chriselrod commented Jun 25, 2019 •

edited

Loading

vtjnash commented Jun 25, 2019

KristofferC commented Jun 23, 2020

chriselrod commented Jun 23, 2020

Segmentation faults with NTuple{2,NTuple{N,Core.VecElement{T}}}, but not NTuple{2N,Core.VecElement{T}} #30426

Segmentation faults with NTuple{2,NTuple{N,Core.VecElement{T}}}, but not NTuple{2N,Core.VecElement{T}} #30426

Comments

chriselrod commented Dec 17, 2018 • edited Loading

vtjnash commented Dec 18, 2018

chriselrod commented Jun 25, 2019 • edited Loading

vtjnash commented Jun 25, 2019

KristofferC commented Jun 23, 2020

chriselrod commented Jun 23, 2020

chriselrod commented Dec 17, 2018 •

edited

Loading

chriselrod commented Jun 25, 2019 •

edited

Loading