LLVM mis-optimize due to returntwice function #17288

maleadt · 2016-07-05T21:41:09Z

One of my packages (CUDAdrv) has recently started failing on julia master, with a segfault in typemap.c. I've bisected this issue to e2bd129 (all backtraces and line numbers below are on that commit's tree). I'm not sure where to start debugging this, so I'm at least reporting it here already.

This causes a segfault in jl_typemap_level_assoc_exact:

signal (11): Segmentation fault
while loading CUDAdrv/test/core.jl, in expression starting on line 172
sig_match_fast at src/gf.c:1707
jl_apply_generic at src/gf.c:1886
Type at CUDAdrv/src/module.jl:67
unknown function (ip: 0x7f6198ad561e)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
eval_body at src/interpreter.c:469
eval_body at src/interpreter.c:515
jl_interpret_call at src/interpreter.c:573
jl_interpret_toplevel_thunk at src/interpreter.c:580
jl_toplevel_eval_flex at src/toplevel.c:543
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
jl_interpret_toplevel_expr at src/interpreter.c:31
jl_toplevel_eval_flex at src/toplevel.c:529
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
process_options at ./client.jl:266
_start at ./client.jl:322
unknown function (ip: 0x7f639ef75124)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
jl_apply at ui/../src/julia.h:1396
true_main at ui/repl.c:546
main at ui/repl.c:674
unknown function (ip: 0x7f63a5013740)
unknown function (ip: 0x401818)
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4

Running in GDB makes it segfault somewhere else, but I assume due to the same problem (jl_typeof(NULL)):

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007ffff76c9407 in jl_typemap_level_assoc_exact (cache=0x7ffdf1802950, args=0x7fffffffae90, n=3, offs=1 '\001') at src/typemap.c:788
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);


(gdb) l
783 
784 jl_typemap_entry_t *jl_typemap_level_assoc_exact(jl_typemap_level_t *cache, jl_value_t **args, size_t n, int8_t offs)
785 {
786     if (n > offs) {
787         jl_value_t *a1 = args[offs];
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);
789         assert(jl_is_datatype(ty));
790         if (ty == (jl_value_t*)jl_datatype_type && cache->targ != (void*)jl_nothing) {
791             union jl_typemap_t ml_or_cache = mtcache_hash_lookup(cache->targ, a1, 1, offs);
792             jl_typemap_entry_t *ml = jl_typemap_assoc_exact(ml_or_cache, args, n, offs+1);

(gdb) call jl_(args[0])
Base.#==()
(gdb) p args[1]
$3 = (jl_value_t *) 0x0
(gdb) call jl_(args[2])
CUDAdrv.CuError(code=209, info=Base.Nullable{String}(isnull=true, value=#<null>))

... with this comparison (against 209 == CUDAdrv.ERROR_NO_BINARY_FOR_GPU) originating from:

try
    @apicall(:cuModuleLoadDataEx,
            (Ptr{CuModule_t}, Ptr{Cchar}, Cuint, Ref{CUjit_option}, Ref{Ptr{Void}}),
            handle_ref, data, length(optionKeys), optionKeys, optionValues)
catch err
    (err == ERROR_NO_BINARY_FOR_GPU || err == ERROR_INVALID_IMAGE) || rethrow(err)
    options = decode(optionKeys, optionValues)
    rethrow(CuError(err.code, options[ERROR_LOG_BUFFER]))
end

I've not been able to reduce the test case, as reduced versions did not reliably trigger the segfault on all my systems anymore, while the full CUDAdrv test suite does. I've tested on two Linux64 systems (one Debian 8, one Arch), with fresh builds without any Makefile flags.

@yuyichao any ideas what might be causing this, or where to look for clues?

The text was updated successfully, but these errors were encountered:

yuyichao · 2016-07-05T21:44:15Z

Error in TLS access should error much more reliably and earlier than this. Does undefing JL_ELF_TLS_VARIANT here get rid of the segfault?

maleadt · 2016-07-05T21:59:10Z

It does.

yuyichao · 2016-07-05T23:32:34Z

LLVM seems to be generating wrong asm. I reduced it to something that doesn't need libcuda (load a dummy shared library instead) but I can only get segfault when running it under optirun.

yuyichao · 2016-07-05T23:47:13Z

OK, with this I could get a segfault outside optirun

immutable CuError
    code::Int
    info::Nullable{String}

    CuError(code) = new(code, Nullable{String}())
end

const libcuda = Ref{Ptr{Void}}(C_NULL)

export CuModuleData

"Create a CUDA module from a string containing PTX code."
function CuModuleData(data)
    handle_ref = Ref{Ptr{Void}}()
    options = Dict{Cint,Any}()
    options[0] = Array(UInt8, 1024*1024)
    optionKeys, optionValues = encode(options)

    try
        throw(CuError(ccall(Libdl.dlsym(libcuda[], :cuModuleLoadDataEx), Cint,
                            (Ptr{Ptr{Void}}, Ptr{Cchar}, Cuint, Ref{Cint}, Ref{Ptr{Void}}), handle_ref, data, length(optionKeys), optionKeys, optionValues)))
    catch err
        err == 1
    end
end

@noinline function encode(options::Dict{Cint,Any})
    return Cint[], Ptr{Void}[]
end

CuModuleData("")

maleadt · 2016-07-05T23:51:40Z

Thanks; yes this case also segfaults on both aforementioned systems.

yuyichao · 2016-07-06T00:53:25Z

reproduced with the following C code in clang so this is a llvm bug. The C program should print 0 in the last line but it prints 4 in the last line when compiling with clang 3.8 with any optimization level.

AFAICT what's happening is that the no-return branch (the one ends with longjmp) uses a + 4 many times with enough operations in between so that the value has to be spilled to the stack (there aren't enough callee save register to keep all of them). LLVM incorrectly uses the stack slot for a on this branch (since after all a is not used on this branch). However, this is invalid since setjmp can return twice and the stack store modifies memory that is still live on the other branch which we'll run later.

What's happening in the julia code is basically replacing a with the thread pointer. After #17116, we emit pool address in the generated code directly (thread_pointer + offset) and together with #17178 opens up more opportunity for LLVM to fold the offset calculations (that's the whole point....). However, for the code above, this happens to cause a stack spill (the same pool address is used on the throw branch a few times and therefore llvm decided to spill it as a whole) and it reuses the slot for the original thread pointer just like the C code below, causing the segfault.

//

#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>

jmp_buf env;

__attribute__((noinline)) int f2(int v)
{
    __asm__ volatile("":::"memory");
    return v * v;
}

__attribute__((noinline)) int f(int a)
{
    int b = random();
    int c = random();
    int d = random();
    int e = random();
    int f = random();
    int g = random();
    int h = random();
    int i = random();
    double k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
    k *= b;
    k -= c;
    k += i;
    if (setjmp(env) == 0) {
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        printf("%f\n", k);
        longjmp(env, 1);
    }
    else {
        printf("%d\n", a);
    }
    return a;
}

int main()
{
    return f(0);
}

yuyichao · 2016-07-06T00:58:04Z

I'm also pretty sure it is not UB to make a non volatile since it is not modified after setjmp.

yuyichao · 2016-07-06T01:58:24Z

Report upstream as https://llvm.org/bugs/show_bug.cgi?id=28431. This is likely a problem in isel so I probably can't help much with debugging it further.

yuyichao · 2016-07-06T02:08:27Z

And the follow julia code reproduces the same issue on 0.4 too.

@noinline f2(v) = v
@noinline p(v) = println(v)
@noinline r() = rand(Int)

function f(a)
    b0 = r()
    c0 = r()
    d0 = r()
    e0 = r()
    f0 = r()
    g0 = r()
    h0 = r()
    i0 = r()
    k::Float64 = f2(b0) + f2(c0 + f2(d0 + f2(e0 + f2(f0 + f2(g0 + f2(h0 + i0))))));
    k *= b0
    k -= c0
    k += i0
    try
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        throw(k)
    catch
        p(a)
    end
end

# @code_llvm f(0)
# @code_native f(0)
f(0)

maleadt · 2016-07-06T13:35:15Z

Impressive sleuthing, thanks for looking into this!
Any workarounds for now? LLVM 27190 doesn't seem terribly active.

yuyichao · 2016-07-06T14:21:11Z

LLVM #27190 doesn't seem terribly active.

It's more active than many other llvm issues I've seen.

Any workarounds for now?

Not really. Maybe try to move something to other functions to keep the register pressure low in the function with try-catch?

yuyichao · 2016-07-06T15:31:16Z

Also it's probably better to check the return status directly instead of throwing it eagerly and rethrow it after checking it's type

maleadt · 2016-07-06T17:22:11Z

Yeah, but this is one of the few places where I'm catching the error and rethrowing it (just to add some additional diagnostic information, so it can be slow). In general, it is directly thrown from within @apicall to the user.

JeffBezanson · 2016-07-21T22:48:31Z

I just ran into this, in the particularly nasty form of any error at the REPL causing a segfault.

yuyichao · 2016-07-21T23:36:16Z

This game is ridiculous, we make the code easier for llvm to optimize and llvm mis-optimizes it.....

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

yuyichao · 2016-07-22T00:15:25Z

I added a workaround in #17543 that should have the least performance impact, it fixes the segfault for my reduced example above at least.....

However, it blocks the possibility of inlining the allocation ultra-fast path in the near future, which is a very natural next step and is why the pool address are used in the first place..........

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround JuliaLang#17288

LLVM has really bad support for returns_twice functions and can incorrectly move memory operations (both at IR and machine code level) due to missing control flow edge. By outlining the exception body, we can hide these functions from LLVM completely (they only exist in C code) and prevent all miscompilation. This also makes it much easier to check the correctness of heap to stack allocation optimization especially since not all memory operation intrinsics in LLVM has a volatile counterpart. This will obviously inhibit some valid optimizations too. These are mainly forwarding of memory operations from the caller to the exception body (since the other way around is almost always invalid) and can be improved with some simple IPO. This also makes it unnecessary to mark any memory operations on the stack with `volatile` this should also improve optimization in certain cases. Since we are scanning all the instructions in the outlined code anyway this also includes a simple optimization to delete exception frame that can't trigger. This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140 Fix #17288

chethega · 2020-03-10T23:26:21Z

CF https://reviews.llvm.org/D75967

Thanks @yuyichao for providing such a nice reproducer!

Theoretically we could consider including that in our llvm patchset, regardless of whether upstream accepts it (depending on whether we are sure enough that I didn't break anything else. I am not very confident that llvm has no other setjmp related bugs, the machineIR world is a weird foreign wild west place).

vtjnash · 2021-03-18T21:23:14Z

Something similar to that a subset of that patch seems to have been merged in https://reviews.llvm.org/D77767

oscardssmith · 2023-08-07T17:19:11Z

Is this bug actually still open? Do we have a function that can reproduce it still?

Seelengrab · 2023-10-09T15:57:25Z

FWIW, the Julia level reproducer doesn't reproduce for me anymore:

julia> f(0)
4
4
4
4
4
4
0

julia> versioninfo()
Julia Version 1.11.0-DEV.534
Commit a127ab7ceeb (2023-09-22 18:48 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 34 on 24 virtual cores
Environment:
  JULIA_PKG_USE_CLI_GIT = true

yuyichao added the upstream The issue is with an upstream dependency, e.g. LLVM label Jul 6, 2016

maleadt added a commit to JuliaGPU/CUDAdrv.jl that referenced this issue Jul 6, 2016

Disable some code triggering JuliaLang/julia#17288.

45e5ce8

yuyichao mentioned this issue Jul 11, 2016

Segfault on rethrow() call without argument #17365

Closed

JeffBezanson added bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Jul 21, 2016

JeffBezanson added the priority This should be addressed urgently label Jul 21, 2016

yuyichao added a commit that referenced this issue Jul 22, 2016

Workaround LLVM bug

46f6b72

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

Keno mentioned this issue Jul 22, 2016

0.5.0-dev segfaults on bad input #17548

Closed

yuyichao changed the title ~~Unexpected NULL arg causes segfault in typemap~~ LLVM mis-optimize due to returntwice function Jul 22, 2016

yuyichao added a commit that referenced this issue Jul 22, 2016

Workaround LLVM bug

87b5b34

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

yuyichao added a commit that referenced this issue Jul 22, 2016

Workaround LLVM bug

f02da5b

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

yuyichao added a commit that referenced this issue Jul 22, 2016

Workaround LLVM bug

effcb41

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround #17288

tkelman mentioned this issue Jul 22, 2016

segfault when a undefined name is accessed #17557

Closed

JeffBezanson mentioned this issue Aug 18, 2016

0.5.0-rc2+0 segfault #18085

Closed

mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016

Workaround LLVM bug

7857675

Ref https://llvm.org/bugs/show_bug.cgi?id=27190 Workaround JuliaLang#17288

JeffBezanson mentioned this issue Sep 8, 2016

Segfault in hvcat #18399

Closed

yuyichao removed the priority This should be addressed urgently label Sep 14, 2016

tkoolen mentioned this issue Jan 14, 2017

Segfault in jl_typemap_level_assoc_exact (?) #20034

Closed

yuyichao mentioned this issue May 7, 2017

Optimize sysimg TLS access on Linux #21738

Merged

yuyichao mentioned this issue Aug 5, 2017

Segmentation fault when using ArgParse #23136

Closed

ViralBShah added the gpu Affects running Julia on a GPU label Sep 7, 2017

yuyichao removed the gpu Affects running Julia on a GPU label Nov 9, 2017

yuyichao self-assigned this Nov 10, 2017

yuyichao mentioned this issue Nov 11, 2017

WIP: Hide returns_twice function from LLVM passes #24573

Closed

Keno added the compiler:llvm For issues that relate to LLVM label Jan 17, 2022

gbaraldi mentioned this issue Jun 29, 2023

Remove new BitCastInst unless it's in a typed pointer context #50338

Merged

LilithHafner added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023

oscardssmith closed this as completed Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM mis-optimize due to returntwice function #17288

LLVM mis-optimize due to returntwice function #17288

maleadt commented Jul 5, 2016

yuyichao commented Jul 5, 2016

maleadt commented Jul 5, 2016

yuyichao commented Jul 5, 2016 •

edited

Loading

yuyichao commented Jul 5, 2016 •

edited

Loading

maleadt commented Jul 5, 2016

yuyichao commented Jul 6, 2016 •

edited

Loading

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016 •

edited

Loading

maleadt commented Jul 6, 2016 •

edited by yuyichao

Loading

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016

maleadt commented Jul 6, 2016

JeffBezanson commented Jul 21, 2016

yuyichao commented Jul 21, 2016 •

edited

Loading

yuyichao commented Jul 22, 2016

chethega commented Mar 10, 2020

vtjnash commented Mar 18, 2021

oscardssmith commented Aug 7, 2023

Seelengrab commented Oct 9, 2023

LLVM mis-optimize due to returntwice function #17288

LLVM mis-optimize due to returntwice function #17288

Comments

maleadt commented Jul 5, 2016

yuyichao commented Jul 5, 2016

maleadt commented Jul 5, 2016

yuyichao commented Jul 5, 2016 • edited Loading

yuyichao commented Jul 5, 2016 • edited Loading

maleadt commented Jul 5, 2016

yuyichao commented Jul 6, 2016 • edited Loading

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016 • edited Loading

maleadt commented Jul 6, 2016 • edited by yuyichao Loading

yuyichao commented Jul 6, 2016

yuyichao commented Jul 6, 2016

maleadt commented Jul 6, 2016

JeffBezanson commented Jul 21, 2016

yuyichao commented Jul 21, 2016 • edited Loading

yuyichao commented Jul 22, 2016

chethega commented Mar 10, 2020

vtjnash commented Mar 18, 2021

oscardssmith commented Aug 7, 2023

Seelengrab commented Oct 9, 2023

yuyichao commented Jul 5, 2016 •

edited

Loading

yuyichao commented Jul 5, 2016 •

edited

Loading

yuyichao commented Jul 6, 2016 •

edited

Loading

yuyichao commented Jul 6, 2016 •

edited

Loading

maleadt commented Jul 6, 2016 •

edited by yuyichao

Loading

yuyichao commented Jul 21, 2016 •

edited

Loading