Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM mis-optimize due to returntwice function #17288

Closed
maleadt opened this issue Jul 5, 2016 · 20 comments
Closed

LLVM mis-optimize due to returntwice function #17288

maleadt opened this issue Jul 5, 2016 · 20 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@maleadt
Copy link
Member

maleadt commented Jul 5, 2016

One of my packages (CUDAdrv) has recently started failing on julia master, with a segfault in typemap.c. I've bisected this issue to e2bd129 (all backtraces and line numbers below are on that commit's tree). I'm not sure where to start debugging this, so I'm at least reporting it here already.

This causes a segfault in jl_typemap_level_assoc_exact:

signal (11): Segmentation fault
while loading CUDAdrv/test/core.jl, in expression starting on line 172
sig_match_fast at src/gf.c:1707
jl_apply_generic at src/gf.c:1886
Type at CUDAdrv/src/module.jl:67
unknown function (ip: 0x7f6198ad561e)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
eval_body at src/interpreter.c:469
eval_body at src/interpreter.c:515
jl_interpret_call at src/interpreter.c:573
jl_interpret_toplevel_thunk at src/interpreter.c:580
jl_toplevel_eval_flex at src/toplevel.c:543
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
jl_interpret_toplevel_expr at src/interpreter.c:31
jl_toplevel_eval_flex at src/toplevel.c:529
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
process_options at ./client.jl:266
_start at ./client.jl:322
unknown function (ip: 0x7f639ef75124)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
jl_apply at ui/../src/julia.h:1396
true_main at ui/repl.c:546
main at ui/repl.c:674
unknown function (ip: 0x7f63a5013740)
unknown function (ip: 0x401818)
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4

Running in GDB makes it segfault somewhere else, but I assume due to the same problem (jl_typeof(NULL)):

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007ffff76c9407 in jl_typemap_level_assoc_exact (cache=0x7ffdf1802950, args=0x7fffffffae90, n=3, offs=1 '\001') at src/typemap.c:788
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);


(gdb) l
783 
784 jl_typemap_entry_t *jl_typemap_level_assoc_exact(jl_typemap_level_t *cache, jl_value_t **args, size_t n, int8_t offs)
785 {
786     if (n > offs) {
787         jl_value_t *a1 = args[offs];
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);
789         assert(jl_is_datatype(ty));
790         if (ty == (jl_value_t*)jl_datatype_type && cache->targ != (void*)jl_nothing) {
791             union jl_typemap_t ml_or_cache = mtcache_hash_lookup(cache->targ, a1, 1, offs);
792             jl_typemap_entry_t *ml = jl_typemap_assoc_exact(ml_or_cache, args, n, offs+1);

(gdb) call jl_(args[0])
Base.#==()
(gdb) p args[1]
$3 = (jl_value_t *) 0x0
(gdb) call jl_(args[2])
CUDAdrv.CuError(code=209, info=Base.Nullable{String}(isnull=true, value=#<null>))

... with this comparison (against 209 == CUDAdrv.ERROR_NO_BINARY_FOR_GPU) originating from:

try
    @apicall(:cuModuleLoadDataEx,
            (Ptr{CuModule_t}, Ptr{Cchar}, Cuint, Ref{CUjit_option}, Ref{Ptr{Void}}),
            handle_ref, data, length(optionKeys), optionKeys, optionValues)
catch err
    (err == ERROR_NO_BINARY_FOR_GPU || err == ERROR_INVALID_IMAGE) || rethrow(err)
    options = decode(optionKeys, optionValues)
    rethrow(CuError(err.code, options[ERROR_LOG_BUFFER]))
end

I've not been able to reduce the test case, as reduced versions did not reliably trigger the segfault on all my systems anymore, while the full CUDAdrv test suite does. I've tested on two Linux64 systems (one Debian 8, one Arch), with fresh builds without any Makefile flags.

@yuyichao any ideas what might be causing this, or where to look for clues?

@yuyichao
Copy link
Contributor

yuyichao commented Jul 5, 2016

Error in TLS access should error much more reliably and earlier than this. Does undefing JL_ELF_TLS_VARIANT here get rid of the segfault?

@maleadt
Copy link
Member Author

maleadt commented Jul 5, 2016

It does.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 5, 2016

LLVM seems to be generating wrong asm. I reduced it to something that doesn't need libcuda (load a dummy shared library instead) but I can only get segfault when running it under optirun.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 5, 2016

OK, with this I could get a segfault outside optirun

immutable CuError
    code::Int
    info::Nullable{String}

    CuError(code) = new(code, Nullable{String}())
end

const libcuda = Ref{Ptr{Void}}(C_NULL)

export CuModuleData

"Create a CUDA module from a string containing PTX code."
function CuModuleData(data)
    handle_ref = Ref{Ptr{Void}}()
    options = Dict{Cint,Any}()
    options[0] = Array(UInt8, 1024*1024)
    optionKeys, optionValues = encode(options)

    try
        throw(CuError(ccall(Libdl.dlsym(libcuda[], :cuModuleLoadDataEx), Cint,
                            (Ptr{Ptr{Void}}, Ptr{Cchar}, Cuint, Ref{Cint}, Ref{Ptr{Void}}), handle_ref, data, length(optionKeys), optionKeys, optionValues)))
    catch err
        err == 1
    end
end

@noinline function encode(options::Dict{Cint,Any})
    return Cint[], Ptr{Void}[]
end

CuModuleData("")

@maleadt
Copy link
Member Author

maleadt commented Jul 5, 2016

Thanks; yes this case also segfaults on both aforementioned systems.

@yuyichao yuyichao added the upstream The issue is with an upstream dependency, e.g. LLVM label Jul 6, 2016
@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

reproduced with the following C code in clang so this is a llvm bug. The C program should print 0 in the last line but it prints 4 in the last line when compiling with clang 3.8 with any optimization level.

AFAICT what's happening is that the no-return branch (the one ends with longjmp) uses a + 4 many times with enough operations in between so that the value has to be spilled to the stack (there aren't enough callee save register to keep all of them). LLVM incorrectly uses the stack slot for a on this branch (since after all a is not used on this branch). However, this is invalid since setjmp can return twice and the stack store modifies memory that is still live on the other branch which we'll run later.

What's happening in the julia code is basically replacing a with the thread pointer. After #17116, we emit pool address in the generated code directly (thread_pointer + offset) and together with #17178 opens up more opportunity for LLVM to fold the offset calculations (that's the whole point....). However, for the code above, this happens to cause a stack spill (the same pool address is used on the throw branch a few times and therefore llvm decided to spill it as a whole) and it reuses the slot for the original thread pointer just like the C code below, causing the segfault.

//

#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>

jmp_buf env;

__attribute__((noinline)) int f2(int v)
{
    __asm__ volatile("":::"memory");
    return v * v;
}

__attribute__((noinline)) int f(int a)
{
    int b = random();
    int c = random();
    int d = random();
    int e = random();
    int f = random();
    int g = random();
    int h = random();
    int i = random();
    double k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
    k *= b;
    k -= c;
    k += i;
    if (setjmp(env) == 0) {
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = random();
        c = random();
        d = random();
        e = random();
        f = random();
        g = random();
        h = random();
        i = random();
        k = f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        printf("%f\n", k);
        longjmp(env, 1);
    }
    else {
        printf("%d\n", a);
    }
    return a;
}

int main()
{
    return f(0);
}

@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

I'm also pretty sure it is not UB to make a non volatile since it is not modified after setjmp.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

Report upstream as https://llvm.org/bugs/show_bug.cgi?id=28431. This is likely a problem in isel so I probably can't help much with debugging it further.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

And the follow julia code reproduces the same issue on 0.4 too.

@noinline f2(v) = v
@noinline p(v) = println(v)
@noinline r() = rand(Int)

function f(a)
    b0 = r()
    c0 = r()
    d0 = r()
    e0 = r()
    f0 = r()
    g0 = r()
    h0 = r()
    i0 = r()
    k::Float64 = f2(b0) + f2(c0 + f2(d0 + f2(e0 + f2(f0 + f2(g0 + f2(h0 + i0))))));
    k *= b0
    k -= c0
    k += i0
    try
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        b = r()
        c = r()
        d = r()
        e = r()
        f = r()
        g = r()
        h = r()
        i = r()
        k += f2(b) + f2(c + f2(d + f2(e + f2(f + f2(g + f2(h + i))))));
        k *= b
        k -= c
        k += i
        p(a + 4)
        throw(k)
    catch
        p(a)
    end
end

# @code_llvm f(0)
# @code_native f(0)
f(0)

@maleadt
Copy link
Member Author

maleadt commented Jul 6, 2016

Impressive sleuthing, thanks for looking into this!
Any workarounds for now? LLVM 27190 doesn't seem terribly active.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

LLVM #27190 doesn't seem terribly active.

It's more active than many other llvm issues I've seen.

Any workarounds for now?

Not really. Maybe try to move something to other functions to keep the register pressure low in the function with try-catch?

maleadt added a commit to JuliaGPU/CUDAdrv.jl that referenced this issue Jul 6, 2016
@yuyichao
Copy link
Contributor

yuyichao commented Jul 6, 2016

Also it's probably better to check the return status directly instead of throwing it eagerly and rethrow it after checking it's type

@maleadt
Copy link
Member Author

maleadt commented Jul 6, 2016

Yeah, but this is one of the few places where I'm catching the error and rethrowing it (just to add some additional diagnostic information, so it can be slow). In general, it is directly thrown from within @apicall to the user.

@JeffBezanson JeffBezanson added bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Jul 21, 2016
@JeffBezanson
Copy link
Member

I just ran into this, in the particularly nasty form of any error at the REPL causing a segfault.

@JeffBezanson JeffBezanson added the priority This should be addressed urgently label Jul 21, 2016
@yuyichao
Copy link
Contributor

yuyichao commented Jul 21, 2016

This game is ridiculous, we make the code easier for llvm to optimize and llvm mis-optimizes it.....

@yuyichao
Copy link
Contributor

I added a workaround in #17543 that should have the least performance impact, it fixes the segfault for my reduced example above at least.....

However, it blocks the possibility of inlining the allocation ultra-fast path in the near future, which is a very natural next step and is why the pool address are used in the first place..........

@yuyichao yuyichao changed the title Unexpected NULL arg causes segfault in typemap LLVM mis-optimize due to returntwice function Jul 22, 2016
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
@yuyichao yuyichao removed the priority This should be addressed urgently label Sep 14, 2016
@ViralBShah ViralBShah added the gpu Affects running Julia on a GPU label Sep 7, 2017
@yuyichao yuyichao removed the gpu Affects running Julia on a GPU label Nov 9, 2017
@yuyichao yuyichao self-assigned this Nov 10, 2017
yuyichao added a commit that referenced this issue Nov 11, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 11, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 11, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 11, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 11, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 12, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 12, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 12, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 13, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Nov 13, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
yuyichao added a commit that referenced this issue Dec 13, 2017
LLVM has really bad support for returns_twice functions and can incorrectly move memory operations
(both at IR and machine code level) due to missing control flow edge.
By outlining the exception body, we can hide these functions from LLVM completely
(they only exist in C code) and prevent all miscompilation.
This also makes it much easier to check the correctness of heap to stack allocation optimization
especially since not all memory operation intrinsics in LLVM has a volatile counterpart.

This will obviously inhibit some valid optimizations too.
These are mainly forwarding of memory operations from the caller to the exception body
(since the other way around is almost always invalid) and can be improved with some simple IPO.

This also makes it unnecessary to mark any memory operations on the stack with `volatile`
this should also improve optimization in certain cases.

Since we are scanning all the instructions in the outlined code anyway this also includes
a simple optimization to delete exception frame that can't trigger.

This implements a tweaked version of https://discourse.julialang.org/t/avoid-llvm-setjmp-bug/1140

Fix #17288
@chethega
Copy link
Contributor

CF https://reviews.llvm.org/D75967

Thanks @yuyichao for providing such a nice reproducer!

Theoretically we could consider including that in our llvm patchset, regardless of whether upstream accepts it (depending on whether we are sure enough that I didn't break anything else. I am not very confident that llvm has no other setjmp related bugs, the machineIR world is a weird foreign wild west place).

@vtjnash
Copy link
Member

vtjnash commented Mar 18, 2021

Something similar to that a subset of that patch seems to have been merged in https://reviews.llvm.org/D77767

@Keno Keno added the compiler:llvm For issues that relate to LLVM label Jan 17, 2022
@LilithHafner LilithHafner added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023
@oscardssmith
Copy link
Member

Is this bug actually still open? Do we have a function that can reproduce it still?

@Seelengrab
Copy link
Contributor

FWIW, the Julia level reproducer doesn't reproduce for me anymore:

julia> f(0)
4
4
4
4
4
4
0

julia> versioninfo()
Julia Version 1.11.0-DEV.534
Commit a127ab7ceeb (2023-09-22 18:48 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 34 on 24 virtual cores
Environment:
  JULIA_PKG_USE_CLI_GIT = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants