Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ThreadSafeModule/Context on Julia 1.9 #324

Merged
merged 9 commits into from
May 30, 2022
Merged

Use ThreadSafeModule/Context on Julia 1.9 #324

merged 9 commits into from
May 30, 2022

Conversation

jpsamaroo
Copy link
Member

From changes originally by @collinwarner

@maleadt
Copy link
Member

maleadt commented May 3, 2022

Maybe add a rule here to not run the Buildkite reverse CI while the PR is in draft -- those tests are fairly expensive. CUDA.jl has such a check for some of the more expensive jobs.

@jpsamaroo
Copy link
Member Author

@pchintalapudi reminder that we need a fix for jl_dump_function_ir

@maleadt
Copy link
Member

maleadt commented May 9, 2022

We can always temporarily use LLVM.jl to render the IR on 1.9, but we need to get this fixed in time of course.

@pchintalapudi
Copy link
Contributor

pchintalapudi commented May 9, 2022

@pchintalapudi reminder that we need a fix for jl_dump_function_ir

Does this PR depend on JuliaLang/julia#44960?

@jpsamaroo
Copy link
Member Author

Does this PR depend on JuliaLang/julia#44960?

No, not yet. I pulled out the GCN changes on this branch to just focus on updates for 1.9.

This will now depend on JuliaLang/julia#45251, which is the necessary fix to jl_dump_function_ir.

@jpsamaroo
Copy link
Member Author

JuliaLang/julia#45252 is working out even better, passing all tests for me locally.

@jpsamaroo jpsamaroo marked this pull request as ready for review May 11, 2022 19:38
@maleadt
Copy link
Member

maleadt commented May 11, 2022

CI is very unhappy?

@jpsamaroo
Copy link
Member Author

Ahh, yeah, I guess I just have to always leave the struct in. Pushed.

@jpsamaroo jpsamaroo marked this pull request as draft May 12, 2022 15:57
@jpsamaroo
Copy link
Member Author

Back to draft while I figure out the GCN failures...

@maleadt
Copy link
Member

maleadt commented May 18, 2022

Any update?

The asserts build throws an assertion:

julia: /workspace/srcdir/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4758: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, const llvm::SDLoc&, llvm::EVT, llvm::SDValue, llvm::SDNodeFlags): Assertion `VT.getSizeInBits() == Operand.getValueSizeInBits() && "Cannot BITCAST between types of different sizes!"' failed.

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #324 (1ac74eb) into master (41099f3) will increase coverage by 1.14%.
The diff coverage is 97.91%.

❗ Current head 1ac74eb differs from pull request most recent head 8420bd7. Consider uploading reports for the commit 8420bd7 to get more accurate results

@@            Coverage Diff             @@
##           master     #324      +/-   ##
==========================================
+ Coverage   75.52%   76.67%   +1.14%     
==========================================
  Files          24       23       -1     
  Lines        2799     2379     -420     
==========================================
- Hits         2114     1824     -290     
+ Misses        685      555     -130     
Impacted Files Coverage Δ
src/precompile.jl 0.00% <0.00%> (ø)
src/driver.jl 91.44% <100.00%> (+2.25%) ⬆️
src/interface.jl 83.51% <100.00%> (+1.09%) ⬆️
src/irgen.jl 84.95% <100.00%> (+0.17%) ⬆️
src/jlgen.jl 72.63% <100.00%> (+2.34%) ⬆️
src/reflection.jl 39.66% <100.00%> (+3.70%) ⬆️
src/rtlib.jl 93.54% <100.00%> (+0.07%) ⬆️
src/optim.jl 89.05% <0.00%> (-0.50%) ⬇️
src/GPUCompiler.jl 100.00% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41099f3...8420bd7. Read the comment docs.

@jpsamaroo jpsamaroo marked this pull request as ready for review May 25, 2022 07:59
@jpsamaroo
Copy link
Member Author

Not sure what's up with the macOS failure? I don't have a system to reproduce on.

@vchuravy
Copy link
Member

cc: @pchintalapudi for the mac error?

src/reflection.jl Outdated Show resolved Hide resolved
test/definitions/native.jl Outdated Show resolved Hide resolved
test/definitions/native.jl Outdated Show resolved Hide resolved
@maleadt maleadt force-pushed the jps/tsmod-tsctx branch 3 times, most recently from 712f617 to d220d68 Compare May 25, 2022 21:46
@maleadt
Copy link
Member

maleadt commented May 25, 2022

There's a failed assertion on 1.8 (only tried macOS) too:

Assertion failed: (New->getType() == getType() && "replaceAllUses of value with new value of different type!"), function doRAUW, file /workspace/srcdir/llvm-project/llvm/lib/IR/Value.cpp, line 494.

No useful backtrace of course, this being JITLink on 1.8/LLVM 13. I'll try with a debug build.

EDIT: never mind, this one is an issue with the new Metal backend. I'll have a look at the 1.9 macOS failure tomorrow.

@maleadt
Copy link
Member

maleadt commented May 25, 2022

I can't reproduce the failure, but that's using a significantly more recent version of Julia: DEV-649 instead of -439 on CI. Looks like the nightlies are very much out of date?

@maleadt
Copy link
Member

maleadt commented May 28, 2022

I was fed up with nightlies being out of date / unsigned / unavailable, so we're just compiling Julia from source now. Looking good, the only thing that now still needs to be addressed is #324 (comment)

@jpsamaroo
Copy link
Member Author

CI is green except for Enzyme, is that related? @vchuravy

@vchuravy
Copy link
Member

Will have to take a closer look at that, but shouldn't hold this up. Since we are testing 1.6 there

src/reflection.jl Outdated Show resolved Hide resolved
src/reflection.jl Outdated Show resolved Hide resolved
src/reflection.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member

maleadt commented May 30, 2022

$ ./contrib/commit-name.sh eb230a5
1.9.0-DEV.672

@maleadt
Copy link
Member

maleadt commented May 30, 2022

The error:

terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument

#2  0x00007ffff7db2535 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff6e62833 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#4  0x00007ffff6e6ebfc in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffff6e6ec69 in std::terminate () at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#6  0x00007ffff6e6eecd in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x7ffff6ff34f0 <typeinfo for std::system_error>, dest=0x7ffff6e9ee60 <std::system_error::~system_error()>)
    at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#7  0x00007ffff6e65921 in std::__throw_system_error (__i=22) at /usr/src/debug/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/system_error:202
#8  0x00007ffff6730939 in std::recursive_mutex::lock (this=0x7ffebd67eeb8) at /usr/include/c++/12.1.0/mutex:112
#9  0x00007ffff674561b in std::unique_lock<std::recursive_mutex>::lock (this=0x7fffffff0be0) at /usr/include/c++/12.1.0/bits/unique_lock.h:139
#10 0x00007ffff673a191 in std::unique_lock<std::recursive_mutex>::unique_lock (this=0x7fffffff0be0, __m=...) at /usr/include/c++/12.1.0/bits/unique_lock.h:69
#11 0x00007ffff67309e3 in llvm::orc::ThreadSafeContext::Lock::Lock (this=0x7fffffff0bd0, S=std::shared_ptr<llvm::orc::ThreadSafeContext::State> (empty) = {...})
    at /home/tim/Julia/src/julia/build/dev/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:42
#12 0x00007ffff6730b77 in llvm::orc::ThreadSafeContext::getLock (this=0x7fffffff0b40) at /home/tim/Julia/src/julia/build/dev/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:69
#13 0x00007ffff67d8e9a in jl_dump_function_ir_impl (dump=0x7fffffff0cd0, strip_ir_metadata=1 '\001', dump_module=0 '\000', debuginfo=0x7fffecc50250 "default") at /home/tim/Julia/src/julia/src/disasm.cpp:505
#14 0x00007fffe01ae08e in julia_#142_1659 (ctx=...) at /home/tim/Julia/pkg/GPUCompiler/src/reflection.jl:134
#15 0x00007fffe01ae2ca in julia_ThreadSafeContext_1878 (f=<error reading variable: Cannot access memory at address 0x1>) at /home/tim/Julia/pkg/LLVM/src/executionengine/ts_module.jl:14
#16 0x00007fffe01ae747 in julia_JuliaContext_1652 (f=<error reading variable: Cannot access memory at address 0x1>) at /home/tim/Julia/pkg/GPUCompiler/src/driver.jl:72
#17 0x00007fffe01ae7fe in jfptr_JuliaContext_1653 ()
#18 0x00007ffff73548f1 in _jl_invoke (F=0x7fffec9a95b0, args=0x7fffffff10c8, nargs=1, mfunc=0x7ffecc30ef90, world=32861) at /home/tim/Julia/src/julia/src/gf.c:2392
#19 0x00007ffff73552a9 in ijl_apply_generic (F=0x7fffec9a95b0, args=0x7fffffff10c8, nargs=1) at /home/tim/Julia/src/julia/src/gf.c:2574
#20 0x00007fffe018e2c7 in julia_#code_llvm#141_1646 (optimize=<optimized out>, raw=0 '\000', debuginfo=<error reading variable: Cannot access memory at address 0x0>, dump_module=0 '\000', kwargs=..., io=...,
    job=0x0) at /home/tim/Julia/pkg/GPUCompiler/src/reflection.jl:125
#21 0x00007fffe018e6a2 in code_llvm () at /home/tim/Julia/pkg/GPUCompiler/src/reflection.jl:122
#22 japi1_#ptx_code_llvm#220_1643 (kwargs=..., io=..., func=0x7ffec9440a48, types=0x7fffe1922100 <jl_system_image_data+1766528>) at /home/tim/Julia/pkg/GPUCompiler/test/definitions/ptx.jl:63
#23 0x00007fffe018e755 in ptx_code_llvm () at /home/tim/Julia/pkg/GPUCompiler/test/definitions/ptx.jl:61

Something looks wrong with the non-RAII dump passed into Julia:

(gdb) p ((llvm::Value*) dump->F)->dump()

Thread 1 "julia-debug" received signal SIGSEGV, Segmentation fault.

@maleadt
Copy link
Member

maleadt commented May 30, 2022

Don't merge this yet; this PR breaks debug info:

; julia> ptx_code_llvm(identity, Tuple{Nothing}; dump_module=true, raw=true)
warning: ignoring debug info with an invalid version (0) in
; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

;  @ operators.jl:513 within `identity`
define void @julia_identity_1531() local_unnamed_addr #0 !dbg !3 {
top:
  ret void, !dbg !5
}

attributes #0 = { "probe-stack"="inline-asm" }

!llvm.dbg.cu = !{!0}

!0 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !1, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !2, nameTableKind: None)
!1 = !DIFile(filename: "operators.jl", directory: ".")
!2 = !{}
!3 = distinct !DISubprogram(name: "identity", linkageName: "julia_identity_1531", scope: null, file: !1, line: 513, type: !4, scopeLine: 513, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !2)
!4 = !DISubroutineType(types: !2)
!5 = !DILocation(line: 513, scope: !3)

vs 1.8:

; julia> ptx_code_llvm(identity, Tuple{Nothing}; dump_module=true, raw=true)
; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

;  @ operators.jl:526 within `identity`
define void @julia_identity_754() local_unnamed_addr #0 !dbg !5 {
top:
  ret void, !dbg !7
}

attributes #0 = { "probe-stack"="inline-asm" }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: None)
!3 = !DIFile(filename: "operators.jl", directory: ".")
!4 = !{}
!5 = distinct !DISubprogram(name: "identity", linkageName: "julia_identity_754", scope: null, file: !3, line: 526, type: !6, scopeLine: 526, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!6 = !DISubroutineType(types: !4)
!7 = !DILocation(line: 526, scope: !5)

maleadt added 2 commits May 30, 2022 14:58
Shouldn't be emitted now that 1.9 doesn't set-up the module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants