Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASAN build is broken in 1.5.0-DEV #35338

Closed
tkf opened this issue Apr 1, 2020 · 16 comments · Fixed by #35364
Closed

ASAN build is broken in 1.5.0-DEV #35338

tkf opened this issue Apr 1, 2020 · 16 comments · Fixed by #35364

Comments

@tkf
Copy link
Member

tkf commented Apr 1, 2020

Initial title: BUILD_LLVM_CLANG=1 does not work with LLVM 9


I'm trying to build Julia with ASAN as described in https://docs.julialang.org/en/latest/devdocs/sanitizers/ (Context: I'm trying to debug a deadlock problem with my parallel quicksort which may be related to a very rare segfault.)

But it seems BUILD_LLVM_CLANG=1 does not work:

root@459e9488a96a:/julia# make USE_BINARYBUILDER_LLVM=0 BUILD_LLVM_CLANG=1
/bin/tar: This does not look like a tar archive
xz: (stdin): File format not recognized
/bin/tar: Child returned status 1
/bin/tar: Error is not recoverable: exiting now
/julia/deps/llvm.mk:312: recipe for target '/julia/deps/srccache/llvm-9.0.1/source-extracted' failed
make[1]: *** [/julia/deps/srccache/llvm-9.0.1/source-extracted] Error 2
Makefile:60: recipe for target 'julia-deps' failed
make: *** [julia-deps] Error 2
root@459e9488a96a:/julia# file deps/srccache/cfe-9.0.1.src.tar.xz
deps/srccache/cfe-9.0.1.src.tar.xz: empty

Looking at LLVM 9.0.1 release it looks like there is no file named cfe-9.0.1.src.tar.xz. LLVM 8.0.1 release does contain cfe-8.0.1.src.tar.xz, though.

@vchuravy
Copy link
Member

vchuravy commented Apr 1, 2020

Seels like they renamed it to clang-9.0.1.src.tar.xz. You can use the prebuilt Yggdrasil binaries as a toolchain.

@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

You can use the prebuilt Yggdrasil binaries as a toolchain.

I've tried

cd usr/tools
ln -s clang clang++
cd -
make clean
make CC=$PWD/usr/tools/clang CXX=$PWD/usr/tools/clang++ \
    LLVM_CONFIG=$PWD/usr/tools/llvm-config USECLANG=1 SANITIZE=1

but I get

/julia/src/aotcompile.cpp:586:13: error: use of undeclared identifier 'createAddressSanitizerFunctionPass'
    PM->add(createAddressSanitizerFunctionPass());
            ^
1 error generated.
Makefile:165: recipe for target 'aotcompile.o' failed
make[1]: *** [aotcompile.o] Error 1
Makefile:75: recipe for target 'julia-src-release' failed
make: *** [julia-src-release] Error 2

Or am I doing something wrong?

@vchuravy
Copy link
Member

vchuravy commented Apr 2, 2020

No that should work, but ASAN is not a tested configuration so you are likely running into the fact that the declaration moved from
llvm/include/llvm/Transforms/Instrumentation.h to
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h and aotcompile is now including the wrong file.

@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

Thanks! So I just added one more include

diff --git a/src/aotcompile.cpp b/src/aotcompile.cpp
index 15623efe5a..c33010ccb0 100644
--- a/src/aotcompile.cpp
+++ b/src/aotcompile.cpp
@@ -23,6 +23,7 @@
 #include <llvm/Transforms/Vectorize.h>
 #if defined(JL_ASAN_ENABLED)
 #include <llvm/Transforms/Instrumentation.h>
+#include <llvm/Transforms/Instrumentation/AddressSanitizer.h>
 #endif
 #include <llvm/Transforms/Scalar/GVN.h>
 #include <llvm/Transforms/IPO/AlwaysInliner.h>

then now CC src/aotcompile.o works

root@459e9488a96a:/julia# make CC=$PWD/usr/tools/clang CXX=$PWD/usr/tools/clang++ LLVM_CONFIG=$PWD/usr/tools/llvm-config USE
CLANG=1 SANITIZE=1
    CC src/aotcompile.o
    CC src/debuginfo.o
    CC src/disasm.o
    CC src/llvm-simdloop.o
    CC src/llvm-muladd.o
    CC src/llvm-final-gc-lowering.o
    CC src/llvm-pass-helpers.o
    CC src/llvm-late-gc-lowering.o
    CC src/llvm-lower-handlers.o
    CC src/llvm-gc-invariant-verifier.o
    CC src/llvm-propagate-addrspaces.o
    CC src/llvm-multiversioning.o
    CC src/llvm-alloc-opt.o
    CC src/cgmemmgr.o
    CC src/llvm-api.o
    LINK usr/lib/libjulia.so.1.5
clang-9: warning: argument unused during compilation: '-mllvm -asan-stack=0' [-Wunused-command-line-argument]
    CC ui/repl.o
    LINK usr/bin/julia
clang-9: warning: argument unused during compilation: '-mllvm -asan-stack=0' [-Wunused-command-line-argument]
/julia/usr/lib/libjulia.so: undefined reference to `llvm::cfg::Update<llvm::BasicBlock*>::dump() const'
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:86: recipe for target '/julia/usr/bin/julia' failed
make[1]: *** [/julia/usr/bin/julia] Error 1
Makefile:78: recipe for target 'julia-ui-release' failed
make: *** [julia-ui-release] Error 2

...but it stops with a different error.

@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

I added CXXFLAGS += -DNDEBUG in Make.user and now get yet another error:

root@459e9488a96a:/julia# make clean
...
root@459e9488a96a:/julia# cat Make.user
CXXFLAGS += -DNDEBUG
root@459e9488a96a:/julia# make CC=$PWD/usr/tools/clang CXX=$PWD/usr/tools/clang++ LLVM_CONFIG=$PWD/usr/tools/llvm-config USECLANG=1 SANITIZE=1
    PERL base/pcre_h.jl
    PERL base/errno_h.jl
...
    JULIA usr/libcorecompiler.ji
==18597==LeakSanitizer has encountered a fatal error.
==18597==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==18597==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
sysimage.mk:61: recipe for target '/julia/usr/lib/julia/corecompiler.ji' failed
make[1]: *** [/julia/usr/lib/julia/corecompiler.ji] Error 1
Makefile:81: recipe for target 'julia-sysimg-ji' failed
make: *** [julia-sysimg-ji] Error 2

@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

This was because I was using docker (#35341 (comment)). Now I get:

    JULIA usr/libcorecompiler.ji

=================================================================
==12535==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1451856 byte(s) in 1 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f635468ab70  (/julia/usr/bin/../lib/libjulia.so.1+0x43ab70)

Direct leak of 79872 byte(s) in 89 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f635468096f  (/julia/usr/bin/../lib/libjulia.so.1+0x43096f)

Direct leak of 282 byte(s) in 21 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f6354452e75  (/julia/usr/bin/../lib/libjulia.so.1+0x202e75)
    #2 0x7f634a5fa716  (/memfd:julia-codegen (deleted)+0x2716)
    #3 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #4 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #5 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #6 0x7f6354461f78  (/julia/usr/bin/../lib/libjulia.so.1+0x211f78)
    #7 0x7f63544cdc07  (/julia/usr/bin/../lib/libjulia.so.1+0x27dc07)
    #8 0x7f634a5f980a  (/memfd:julia-codegen (deleted)+0x180a)
    #9 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #10 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #11 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #12 0x7f63544c7d5a  (/julia/usr/bin/../lib/libjulia.so.1+0x277d5a)
    #13 0x7f63544c9028  (/julia/usr/bin/../lib/libjulia.so.1+0x279028)
    #14 0x7f63544cd6e3  (/julia/usr/bin/../lib/libjulia.so.1+0x27d6e3)
    #15 0x7f634a5f8b0a  (/memfd:julia-codegen (deleted)+0xb0a)
    #16 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #17 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #18 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #19 0x7f6354461f78  (/julia/usr/bin/../lib/libjulia.so.1+0x211f78)
    #20 0x7f63544cdb73  (/julia/usr/bin/../lib/libjulia.so.1+0x27db73)
    #21 0x4f1bd4  (/julia/usr/bin/julia+0x4f1bd4)
    #22 0x4f1584  (/julia/usr/bin/julia+0x4f1584)
    #23 0x4f1284  (/julia/usr/bin/julia+0x4f1284)
    #24 0x7f6352f15b96  (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

... more ...

SUMMARY: AddressSanitizer: 1603973 byte(s) leaked in 1286 allocation(s).
sysimage.mk:61: recipe for target '/julia/usr/lib/julia/corecompiler.ji' failed
make[1]: *** [/julia/usr/lib/julia/corecompiler.ji] Error 1
Makefile:81: recipe for target 'julia-sysimg-ji' failed
make: *** [julia-sysimg-ji] Error 2

Does it mean julia compiler has some leaks?

1 similar comment
@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

This was because I was using docker (#35341 (comment)). Now I get:

    JULIA usr/libcorecompiler.ji

=================================================================
==12535==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1451856 byte(s) in 1 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f635468ab70  (/julia/usr/bin/../lib/libjulia.so.1+0x43ab70)

Direct leak of 79872 byte(s) in 89 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f635468096f  (/julia/usr/bin/../lib/libjulia.so.1+0x43096f)

Direct leak of 282 byte(s) in 21 object(s) allocated from:
    #0 0x4a8cf0  (/julia/usr/bin/julia+0x4a8cf0)
    #1 0x7f6354452e75  (/julia/usr/bin/../lib/libjulia.so.1+0x202e75)
    #2 0x7f634a5fa716  (/memfd:julia-codegen (deleted)+0x2716)
    #3 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #4 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #5 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #6 0x7f6354461f78  (/julia/usr/bin/../lib/libjulia.so.1+0x211f78)
    #7 0x7f63544cdc07  (/julia/usr/bin/../lib/libjulia.so.1+0x27dc07)
    #8 0x7f634a5f980a  (/memfd:julia-codegen (deleted)+0x180a)
    #9 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #10 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #11 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #12 0x7f63544c7d5a  (/julia/usr/bin/../lib/libjulia.so.1+0x277d5a)
    #13 0x7f63544c9028  (/julia/usr/bin/../lib/libjulia.so.1+0x279028)
    #14 0x7f63544cd6e3  (/julia/usr/bin/../lib/libjulia.so.1+0x27d6e3)
    #15 0x7f634a5f8b0a  (/memfd:julia-codegen (deleted)+0xb0a)
    #16 0x7f6354484c86  (/julia/usr/bin/../lib/libjulia.so.1+0x234c86)
    #17 0x7f63544857cb  (/julia/usr/bin/../lib/libjulia.so.1+0x2357cb)
    #18 0x7f63544ca6df  (/julia/usr/bin/../lib/libjulia.so.1+0x27a6df)
    #19 0x7f6354461f78  (/julia/usr/bin/../lib/libjulia.so.1+0x211f78)
    #20 0x7f63544cdb73  (/julia/usr/bin/../lib/libjulia.so.1+0x27db73)
    #21 0x4f1bd4  (/julia/usr/bin/julia+0x4f1bd4)
    #22 0x4f1584  (/julia/usr/bin/julia+0x4f1584)
    #23 0x4f1284  (/julia/usr/bin/julia+0x4f1284)
    #24 0x7f6352f15b96  (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

... more ...

SUMMARY: AddressSanitizer: 1603973 byte(s) leaked in 1286 allocation(s).
sysimage.mk:61: recipe for target '/julia/usr/lib/julia/corecompiler.ji' failed
make[1]: *** [/julia/usr/lib/julia/corecompiler.ji] Error 1
Makefile:81: recipe for target 'julia-sysimg-ji' failed
make: *** [julia-sysimg-ji] Error 2

Does it mean julia compiler has some leaks?

@tkf
Copy link
Member Author

tkf commented Apr 2, 2020

https://docs.julialang.org/en/latest/devdocs/sanitizers/#Address-Sanitizer-(ASAN)-1 says

For now, Julia also sets detect_leaks=0, but this should be removed in the future.

So I guess that's actually expected? Maybe it was removed by accident or something?

With ASAN_OPTIONS=detect_leaks=0:allow_user_segv_handler=1 make CC=$PWD/usr/tools/clang CXX=$PWD/usr/tools/clang++ LLVM_CONFIG=$PWD/usr/tools/llvm-config USECLANG=1 SANITIZE=1 now I have:

...
Stdlibs total  ──164.869968 seconds
Sysimage built. Summary:
Total ─────── 260.440745 seconds 
Base: ───────  95.570752 seconds 36.6958%
Stdlibs: ──── 164.869968 seconds 63.3042%
    JULIA usr/libsys-o.a
Generating precompile statements...ERROR: LoadError: InexactError: check_top_bit(UInt64, -4919131752989213765)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Type{UInt64}, ::Int64) at ./boot.jl:557
 [2] check_top_bit at ./boot.jl:571 [inlined]
 [3] toUInt64 at ./boot.jl:682 [inlined]
 [4] UInt64 at ./boot.jl:712 [inlined]
 [5] convert(::Type{UInt64}, ::Int64) at ./number.jl:7
 [6] uv_alloc_buf(::Ptr{Nothing}, ::UInt64, ::Ptr{Nothing}) at ./stream.jl:533
 [7] poptaskref(::Base.InvasiveLinkedListSynchronized{Task}) at ./task.jl:702
 [8] wait at ./task.jl:709 [inlined]
 [9] wait(::Base.GenericCondition{ReentrantLock}) at ./condition.jl:106
 [10] (::Base.var"#541#542"{Bool,Base.BufferStream,UInt8})() at ./stream.jl:1260
 [11] lock(::Base.var"#541#542"{Bool,Base.BufferStream,UInt8}, ::ReentrantLock) at ./lock.jl:161
 [12] lock at ./condition.jl:78 [inlined]
 [13] #readuntil#540 at ./stream.jl:1258 [inlined]
 [14] readuntil_string at ./io.jl:716 [inlined]
 [15] readuntil(::Base.BufferStream, ::String; keep::Bool) at ./io.jl:839
 [16] readuntil at ./io.jl:836 [inlined]
 [17] (::Main.anonymous.var"#2#6"{UInt64,Base.DevNull})(::String, ::IOStream) at /julia/contrib/generate_precompile.jl:134
 [18] mktemp(::Main.anonymous.var"#2#6"{UInt64,Base.DevNull}, ::String) at ./file.jl:659
 [19] mktemp at ./file.jl:657 [inlined]
 [20] generate_precompile_statements() at /julia/contrib/generate_precompile.jl:80
 [21] top-level scope at /julia/contrib/generate_precompile.jl:190
in expression starting at /julia/contrib/generate_precompile.jl:3
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
sysimage.mk:86: recipe for target '/julia/usr/lib/julia/sys-o.a' failed
make[1]: *** [/julia/usr/lib/julia/sys-o.a] Error 1
Makefile:87: recipe for target 'julia-sysimg-release' failed
make: *** [julia-sysimg-release] Error 2

It' not the failure mode I expect from ASAN-enabled julia but it seems to help to manifest a bug?

@tkf
Copy link
Member Author

tkf commented Apr 3, 2020

I ran the same command again and then somehow now it works:

root@919704e44cd5:/julia# ASAN_OPTIONS=detect_leaks=0:allow_user_segv_handler=1 make CC=$PWD/usr/tools/clang CXX=$PWD/usr/tools/clang++ LLVM_CONFIG=$PWD/usr/tools/llvm-config USECLANG=1 SANITIZE=1
    JULIA usr/libsys-o.a
Generating precompile statements... 1718 generated in 481.379059 seconds (overhead 297.103823 seconds)
    LINK usr/libsys.so
    CC usr/lib/libccalltest.so
    CC usr/lib/libllvmcalltest.so

I re-built it after make clean but I couldn't get the InexactError above again. It also somehow made #35341 more reproducible: #35341 (comment)

@maleadt
Copy link
Member

maleadt commented Apr 3, 2020

Yeah, most the leaks are expected (when I last looked at them), I don't know why the default ASAN options aren't picked up anymore. You should put them there manually, as you did.

@Keno
Copy link
Member

Keno commented Apr 4, 2020

-4919131752989213765

This is 0xbb repeated, which is what the GC fills memory it frees with in MEMDEBUG mode. Looks like this found an actual bug.

@Keno
Copy link
Member

Keno commented Apr 4, 2020

That said, I don't really see show this can happen. That variable should never be a boxed GC value. If you can reproduce it again, I'll take an rr trace ;).

@tkf tkf changed the title BUILD_LLVM_CLANG=1 does not work with LLVM 9 ASAN build is broken in 1.5.0-DEV Apr 5, 2020
@tkf
Copy link
Member Author

tkf commented Apr 5, 2020

Re ASAN_OPTIONS, we still have

julia/src/init.c

Lines 53 to 60 in c973ad8

#ifdef JL_ASAN_ENABLED
JL_DLLEXPORT const char* __asan_default_options() {
return "allow_user_segv_handler=1:detect_leaks=0";
// FIXME: enable LSAN after fixing leaks & defining __lsan_default_suppressions(),
// or defining __lsan_default_options = exitcode=0 once publicly available
// (here and in flisp/flmain.c)
}
#endif

but

julia> unsafe_string(ccall(:__asan_default_options, Cstring, ()))
""

I guess this is supposed to return "allow_user_segv_handler=1:detect_leaks=0"?

@maleadt
Copy link
Member

maleadt commented Apr 6, 2020

Yes, that looks wrong...

@Keno
Copy link
Member

Keno commented Apr 6, 2020

I think the issue may be that that function moved from from the executable to the library. Perhaps we need to move it back into the executable.

@Keno
Copy link
Member

Keno commented Apr 6, 2020

Also, we should annotate jl_ and any other functions we use during debugging as no_instrument, so that they can be used in a GDB session without triggering faults on invalid data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants