-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zig c++ hanging when invoked in parallel #9139
Comments
Like discussed on IRC, I ran this overnight with a debug build of zig; it keeps running successfully, 266 iterations, and counting. |
Apparently it needs a Process 1
Process 2
Not sure how significant the warning is: I cannot run strace in the container, because it doesn't have ptrace permissions (I forgot to add Zig version0.8.0-194-gb9e78593b, compiled from https://github.com/ziglang/zig-bootstrap with these changes:
|
Interesting, are you able to get the output of |
I have stopped the original and am trying to capture it again with |
perhaps the conditional is our happy little bug? Line 2378 in 1f29b75
|
A new set of borked processes, with ptrace privileges: ps auxf
lslocks --output-all | grep zig
gdb process 1
gdb process 2
Not sure why the namespace warning is still there; I am running gdb inside the container. This time I'm leaving these proceses to hang for further investigation. Let me know if there is something more I can try. |
lsof 5598 ("process 1")
lsof 5605 ("process 2")
|
For the record, I still have the (quite powerful) VM hanging around with the stuck processes, if you need any more info. Once there is a patch, I am happy to re-run the tests with the patch. |
Putting the files as requested over IRC. 18c5d15d382debc4dd4079a12f75e0a1.txt |
File being compiled: File being compiled: File being compiled: File being compiled: |
Process 1 locks vector.o & compiles it. Process 2 locks variant.o & compiles it. Process 1 tries to grab lock on variant.o and waits. Process 2 tries to grab lock on vector.o and waits. Deadlock. I think the problem is clear, this seems to be a design flaw. I'm looking into making proper use of shared locks to solve the problem. |
I believe #7596 will solve this. |
yup the design flaw is verified; reviewing the
edit: I also get confirmation on 2 deadlocks processes on my own system |
I've implemented a fix to this here: #9258 |
I am using a combination of zig c++, golang and bazel to cross-compile a cgo program to Darwin. It compiles go stdlib in parallel and sometimes hangs. A container with a hung
ps auxf
looks as follows:Both are waiting on some lock (pids are different, since I am stracing outside the container):
kill -USR1
did not produce a stack trace. Is there any more information I can provide? Steps to reproduce on a x86_64-linux machine with a working docker installation:It fails more often in builds.sr.ht (therefore the test script has
--cpuset-cpus=0-1
, because builds.sr.ht allocates 2 CPUs), e.g. https://builds.sr.ht/~motiejus/job/526372. On my laptop it failed on the 15'th iteration, an iteration is ~90 seconds.zig version: 0.9.0-dev.137+86ebd4b97. I know bazel in the loop is cumbersome, but I wasn't able to find an easy way to reproduce it without it.
The text was updated successfully, but these errors were encountered: