-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded code hanging forever with Julia 1.10 #2261
Comments
Using https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution It looks like we get stuck on entering GC because
|
Could you test #2262 and see if it fixes your issue? |
Thanks for looking into it! I checked out the branch locally, deved and it still deadlocks on Windows. I am happy to run some diagnostics to track it further, but not really sure what commands I need to run (The profiler can't be triggered during execution on Windows if I understood correctly and the |
Yeah windows makes that harder, if you can somehow get a backtrack for all threads that would help immensely. I could reproduce the hang on Linux before, but can't anymore. Maybe you could try WSL? |
I managed to reproduce a deadlock with WSL. I ran the program with 4 threads this time. The first iteration of Backtrace
Collected profile
|
Hm but you were able to collect a profile. That means it didn't fully hang at that point. |
I tried the experience multiple times:
==============================================================
Profile collected. A report will print at the next yield point
==============================================================
^C^C^C^C^C^CWARNING: Force throwing a SIGINT
Segmentation fault And I never get access to the profile. I added below one of the reports I get in this case: Backtrace
In a few cases with WSL, I manage to get a profile out by continuing to send interruption signals. I assume I manage to interrupt a function in particular but really not sure what is going on here, I get something out once in 10 tries I would say. ==============================================================
Profile collected. A report will print at the next yield point
==============================================================
^C^C^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: LoadError: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base ./task.jl:931
[2] wait()
@ Base ./task.jl:995
[3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
@ Base ./condition.jl:130
[4] wait
@ Base ./condition.jl:125 [inlined]
[5] _wait(t::Task)
@ Base ./task.jl:310
[6] ^Cthreading_run(fun::var"#39#threadsfor_fun#8"{var"#39#threadsfor_fun#7#9"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, static::Bool)
@ Base.Threads ./threadingconstructs.jl:166
[7] macro expansion
@ ./threadingconstructs.jl:219 [inlined]
[8] main()
@ Main /path/to/test_deadlock.jl:7
[9] top-level scope
@ /path/to/test_deadlock.jl:15
[10] include(fname::String)
@ Base.MainInclude ./client.jl:489
[11] top-level scope
@ REPL[2]:1
[12] top-level scope
@ /path/to/local/CUDA.jl/src/initialization.jl:206
in expression starting at /path/to/test_deadlock.jl:15 |
Can you try the latest version of the PR (which marks all |
Thanks. I tried the latest version of the PR and can't make my MWE deadlock on WSL or Windows with julia 1.10 anymore. I tried the latest version to my original code which still deadlocks with 1.10 and finishes normally with 1.9. I reduced the new version which is very similar to the previous one except for the FFT plan. I added the backtrace obtained from WSL with 8 threads and the new MWE: using CUDA
using ChunkSplitters
function main()
data = rand(ComplexF32, (100, 100, 8, 20, 200))
cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))
plans = [CUDA.CUFFT.plan_bfft(CUDA.zeros(ComplexF32, (100, 100, 8)), 1:2) for _ in 1:Threads.nthreads()]
Threads.@threads for (ichunk, chunk) in enumerate(chunks(axes(data, 5); n=Threads.nthreads()))
for i in chunk
for t in axes(data, 4)
cu_result[:, :, t, i] .= sum(plans[ichunk] * CuArray(data[:, :, :, t, i]))
end
end
end
end
println(getpid())
for i in 1:5
println("Run $i")
main()
end Backtrace
|
I extended the PR to cover all libraries, i.e., including cuFFT. Can you test again? |
Problem fixed with the latest version! |
Describe the bug
Thanks for your work on this library. Some of the code I wrote with multi-threading and CUDA hangs forever when using julia-1.10, it runs correctly with julia-1.9.4.
I manually reduced the code to the best of ability using differential testing while still triggering the bug.
To reproduce
The program hangs forever when using 4, 5, 6, 7 and 8 threads (my core count) with julia 1.10. The Minimal Working Example (MWE) for this bug is:
The program finishes normally with julia1.9
Manifest.toml
Expected behavior
I expect the program to finish (and cu_result to contain the correct result).
Version info
Details for Julia 1.10
CUDA version with Julia 1.10
Version details with Julia 1.9
Details of Julia 1.9
Details on CUDA (Julia 1.9.4):
Additional context
Thanks very much for your help! Please let me know if I can help further with this!
The text was updated successfully, but these errors were encountered: