-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithread issue(julia 1.71) #182
Comments
As noted in the issues you linked, Downloads was not threadsafe in Julia 1.7; the Julia 1.8 release is the first one that is threadsafe. As also noted in those issues, you don't need to use threads to do concurrent downloads, you can just uses tasks. |
Thanks for the quick reply. FWIW: I ran the MRE on Julia 1.8beta. The error seems to be gone but - unfortunately if you run the loop maybe 10 times or so (YMMV) you'll get an infinite loop/hanging execution. So the failure mode is different, but it does still seem to fail, unfortunately. One thing I would like to mention is: this problem appears to happen in programs which are not just threaded loops downloading files. For example, someone(our situation) could have a multithreaded client (say Azure.jl) where downloading files is a only a small "part" of what is happening. So the Task suggestion doesn't work as well there. |
I cannot reproduce that on Julia 1.8-beta1 or Julia master on macOS. What operating system are you trying this on and what is the exact MWE? |
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.0-beta1 (2022-02-23)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Downloads
julia> Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
julia> Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
julia> Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
julia> Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
julia> Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
this hangs forever |
I just tried it on the exact same LSB without issues. Not sure what to do given the inability to repro. |
For what is worth, that works for me with Julia julia> @time Threads.@threads for url in 1:400
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
40.885960 seconds (229.03 k allocations: 13.134 MiB, 0.13% compilation time)
julia> Threads.nthreads()
4 |
1.7.3 should be too, since we backported the fixes (#175) |
it works the first randn times that it's run, but if I run it consecutively in the REPL (not hammering it either - ie: waiting for cursor to come back), I get intermittent infinite loop conditions. This is similar to the curl bug - in that I could run the code some number of times before seg faulting. REPL is still frozen from when I made the last post. I could make a little video but its pretty boring to watch. Maybe it's nothing or linked only to the REPL, but I dunno, it seems like undesirable behavior. |
I did the same thing, hitting up in the REPL twenty or so times, running the threaded loop repeatedly. No hangs. I think to debug this further we need some deeper debugging on a system that exhibits the behavior. |
Could run an RR trace on it if I could reproduce it in a script. I'll let you know how that goes - maybe tomorrow. |
I was able to reproduce it in a script, but it is intermittant(IE: may require 3-10 attempts) Script: using Downloads
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
println("did not hang") Script is called via: Please try running this a few times in succession (ideally on linux/ubuntu), if that fails I will try to submit an RR. Attempts from the console look like this:
(where the last execution is stuck) |
I get a hang too: julia> using Downloads
julia> for i = 1:100
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("http://google.com"; output="/tmp/cool_$url", method="GET")
end
println("$i did not hang")
end
1 did not hang
2 did not hang
3 did not hang
4 did not hang
5 did not hang
6 did not hang
7 did not hang
8 did not hang
9 did not hang
10 did not hang
11 did not hang
12 did not hang
13 did not hang
14 did not hang
15 did not hang
16 did not hang
17 did not hang
18 did not hang
19 did not hang
20 did not hang
21 did not hang
22 did not hang
23 did not hang
24 did not hang
25 did not hang
26 did not hang
27 did not hang
28 did not hang
29 did not hang
30 did not hang
31 did not hang
32 did not hang
33 did not hang
34 did not hang
35 did not hang
36 did not hang
37 did not hang
38 did not hang
39 did not hang
40 did not hang
41 did not hang
42 did not hang
43 did not hang
44 did not hang (Then it hung on 45). When I ctrl-c out: ^C^C^C^C^C^CWARNING: Force throwing a SIGINT
^C^CERROR: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base ./task.jl:834
[2] wait()
@ Base ./task.jl:894
[3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
@ Base ./condition.jl:124
[4] _wait(t::Task)
@ Base ./task.jl:304
[5] wait
@ ./task.jl:343 [inlined]
[6] threading_run(fun::var"#307#threadsfor_fun#62"{var"#307#threadsfor_fun#61#63"{Vector{String}}}, static::Bool)
@ Base.Threads ./threadingconstructs.jl:38
[7] macro expansion
@ ./threadingconstructs.jl:89 [inlined]
[8] top-level scope
@ ./REPL[4]:2 On julia> versioninfo()
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 4 × Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
Threads: 4 on 4 virtual cores
shell> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal |
Nice change to the MRE :) I wouldn't be surprised if this issue is different from the original one presented at the start of this (might be google thinking we're doing a DoS), but it does seem to be unintended behavior that could break something in a hard way to diagnose while in production. |
Hm, well if I change it to for i = 1:100
Threads.@threads for url in ["1","2","3","4","5","6","7","8"]
resp = Downloads.request("https://httpbingo.julialang.org"; output="/tmp/cool_$url", method="GET")
end
sleep(1)
println("$i did not hang")
end I get to 46 and then it hangs, same stacktrace. Would hope |
Likely not a DoS issue then. Interesting that yours occurs ~40 some odd iterations and on my machine it's happening much sooner. |
I can only reproduce the google version of this, which seems like it might be them throttling us. |
When I run eric's httpbingo example I stop at roughly the same number of iterations as the google version
(but its stuck on iteration #4) So its reproducible for me. |
On my m1 laptop on rosetta, with ❯ julia-beta
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.0-beta1 (2022-02-23)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Downloads
julia> versioninfo()
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
Threads: 4 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = 4
JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager the |
I ran both trials on a MacOS rosetta and also did not reproduce the bug(neither on Google nor httpbingo). So - I would suggest this is not network throttling, unless common VM IP addresses are flagged on both httpbingo &b google. Even if it is a network thing - I am not sure if an infinite loop is the correct outcome? So it seems like its happening on sporadically on Linux for sure. Don't have access to a windows machine, but that would be interesting too. |
#188 is interesting: can folks who are able to reproduce this indicate whether they have using Libdl
filter!(contains("curl"), dllist()) |
I've got a julia> using Libdl
julia> filter!(contains("curl"), dllist())
1-element Vector{String}:
"/home/ubuntu/.asdf/installs/julia/1.8.0-beta3/bin/../lib/julia/libcurl.so" Then I did However... now I cannot reproduce the original issue, even with the conda folder. (At first that was because I forgot to pass |
I do have a julia> using Libdl
julia> filter!(contains("curl"), dllist())
1-element Vector{String}:
"/home/ubuntu/julia-1.8.0-beta1/bin/../lib/julia/libcurl.so" Was able to reproduce the issue. I did the Was able to reproduce the issue again. Weird that you can't reproduce it anymore @ericphanson |
Just tested on beta3. Interestingly, while running an rr on the httpbingo site I could not reproduce it via the REPL. Leading us back to throttling, however, I was able to get an RR using the google example by executing a script instead. @StefanKarpinski where do we submit public RR's too? I can also send it over RAI slack if that's preferable. ubuntu@ip-172-31-84-172:~$ rr record julia-1.8.0-beta3/bin/julia script.jl -t 4
rr: Saving execution to trace directory `/home/ubuntu/.local/share/rr/julia-15'.
1 did not hang
2 did not hang
3 did not hang
4 did not hang
5 did not hang
6 did not hang
7 did not hang
8 did not hang
9 did not hang
10 did not hang
11 did not hang
12 did not hang
13 did not hang
14 did not hang
15 did not hang
16 did not hang
17 did not hang
18 did not hang
19 did not hang
20 did not hang
21 did not hang
22 did not hang
23 did not hang
24 did not hang
25 did not hang
26 did not hang
27 did not hang
signal (15): Terminated
in expression starting at /home/ubuntu/script.jl:3
unknown function (ip: 0x70000002)
unknown function (ip: 0x7f00b2bff1a3)
traced_raw_syscall at /home/roc/rr/rr/src/preload/syscallbuf.c:278
sys_epoll_wait at /home/roc/rr/rr/src/preload/syscallbuf.c:2278 [inlined]
syscall_hook_internal at /home/roc/rr/rr/src/preload/syscallbuf.c:3287 [inlined]
syscall_hook at /home/roc/rr/rr/src/preload/syscallbuf.c:3454
unknown function (ip: 0x7f00b2bfa0af)
unknown function (ip: 0x7f00b2bfa10e)
unknown function (ip: 0x7f00b2bfa12a)
epoll_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:240
uv_run at /workspace/srcdir/libuv/src/unix/core.c:383
ijl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:556
poptask at ./task.jl:919
wait at ./task.jl:928
task_done_hook at ./task.jl:632
jfptr_task_done_hook_57395.clone_1 at /home/ubuntu/julia-1.8.0-beta3/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1831 [inlined]
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:254
start_task at /buildworker/worker/package_linux64/build/src/task.c:942
unknown function (ip: (nil))
Allocations: 2901 (Pool: 2890; Big: 11); GC: 0 |
Having an rr trace is excellent! RAI Slack would be easiest. |
According to @vtjnash, this may have been solved by JuliaLang/julia#44956. If this cannot be reproduced on Julia master (or the next version of Julia 1.8 when it comes out), then we can consider this fixed. |
Just compiled and ran master on the same machine/configuration experiencing the issue. The issue appears resolved in that release. |
Great. I'm gonna close this then. Thanks for the debugging work! |
The mentioned issues are closed: - JuliaLang/Downloads.jl#110 (closed in Nov 2021 - JuliaLang/Downloads.jl#182 (closed in Apr 2022)
* Remove mention of Downloads.jl having threading issues The mentioned issues are closed: - JuliaLang/Downloads.jl#110 (closed in Nov 2021 - JuliaLang/Downloads.jl#182 (closed in Apr 2022) * Update src/utilities/downloads_backend.jl Co-authored-by: arnaudh <[email protected]> --------- Co-authored-by: mattBrzezinski <[email protected]> Co-authored-by: mattBrzezinski <[email protected]>
Conditions: Ubuntu/MacOS, Julia 1.7.1/1.6.5
Related issues: #94 , #110
Note: the examples here are MRE's and are not what the troublesome code looks like. It seems like a multi-threaded julia application calling Azure/Downloads suffers sporadically from this issue.
The error looks like this(where the last 3 lines repeat seemingly forever):
Can sometimes reproduce with:
JULIA_NUM_THREADS=4 julia171
If this does not reproduce the
curl_multi_socket_action
bug rerun the Threads.@threads loop until it happens (typically takes 1-5 executions).Also sporadically reproduces without the explicit I/O scope/pattern with the same frequency, ie:
Maybe more/less interesting: without the output field(ie:
Downloads.request("http://google.com"; method="GET")
) there is another nondeterministic error that infrequently occurs..:Any help or guidance appreciated
The text was updated successfully, but these errors were encountered: