Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zig c++ hanging when invoked in parallel #9139

Closed
motiejus opened this issue Jun 17, 2021 · 14 comments · Fixed by #9258
Closed

zig c++ hanging when invoked in parallel #9139

motiejus opened this issue Jun 17, 2021 · 14 comments · Fixed by #9258
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@motiejus
Copy link
Contributor

motiejus commented Jun 17, 2021

I am using a combination of zig c++, golang and bazel to cross-compile a cgo program to Darwin. It compiles go stdlib in parallel and sometimes hangs. A container with a hung ps auxf looks as follows:

root@4ba08cf5c898:/x# ps auxf | cat
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        5093  0.4  0.0   3968  3472 pts/1    Ss   04:03   0:00 bash
root        5190  0.0  0.0   6700  2952 pts/1    R+   04:03   0:00  \_ ps auxf
root        5191  0.0  0.0   2468   516 pts/1    S+   04:03   0:00  \_ cat
root           1  0.0  0.0 709728 17392 pts/0    Ssl+ Jun16   0:01 ./bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2423  0.0  0.0 343176 22712 pts/0    Sl+  Jun16   0:12 /root/.cache/bazelisk/downloads/bazelbuild/bazel-4.1.0-linux-x86_64/bin/bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2428  0.5  3.1 14181376 1017740 ?    Ssl  Jun16   2:11 bazel(x) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/jli:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/server:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/ -Dfile.encoding=ISO-8859-1 -jar /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 --output_user_root=/root/.cache/bazel/_bazel_root --install_base=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220 --install_md5=f95ca91ebc34d56aa0f8ad499de91220 --output_base=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --workspace_directory=/x --default_system_javabase= --failure_detail_out=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/failure_detail.rawproto --expand_configs_in_place --idle_server_tasks --write_command_log --nowatchfs --nofatal_event_bus_exceptions --nowindows_enable_symlinks --client_debug=false --product_name=Bazel --noincompatible_enable_execution_transition --option_sources=
root        2709  0.0  0.0   5796  1632 ?        S    Jun16   0:00  \_ /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/process-wrapper --timeout=0 --kill_delay=15 bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        2710  0.0  0.0 707676 13572 ?        Ssl  Jun16   0:01      \_ bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        2715  0.0  0.0 1159684 31992 ?       Sl   Jun16   0:03          \_ external/go_sdk/bin/go install -toolexec /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/host/bin/external/go_sdk/builder filterbuildid -gcflags=all= -ldflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc -asmflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_ std runtime/cgo
root        5087  0.0  0.0 152932 30948 ?        Sl   Jun16   0:00              \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/net -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b087=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b087/_cgo_.o /tmp/go-build2201875783/b087/_cgo_main.o /tmp/go-build2201875783/b087/_x001.o /tmp/go-build2201875783/b087/_x002.o /tmp/go-build2201875783/b087/_x003.o /tmp/go-build2201875783/b087/_x004.o /tmp/go-build2201875783/b087/_x005.o -target aarch64-macos-gnu
root        5088  0.0  0.0 152932 31092 ?        Sl   Jun16   0:00              \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/processwrapper-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/os/user -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2201875783/b036=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2201875783/b036/_cgo_.o /tmp/go-build2201875783/b036/_cgo_main.o /tmp/go-build2201875783/b036/_x001.o /tmp/go-build2201875783/b036/_x002.o /tmp/go-build2201875783/b036/_x003.o /tmp/go-build2201875783/b036/_x004.o -target aarch64-macos-gnu

Both are waiting on some lock (pids are different, since I am stracing outside the container):

motiejus ~/code/bazel-zig-cc $ sudo strace -p 812414
strace: Process 812414 attached
futex(0x7f3743028b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812414 detached
 <detached ...>

motiejus ~/code/bazel-zig-cc $ sudo strace -p 812415
strace: Process 812415 attached
futex(0x7f12064f8b70, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 812415 detached
 <detached ...>

kill -USR1 did not produce a stack trace. Is there any more information I can provide? Steps to reproduce on a x86_64-linux machine with a working docker installation:

$ git clone https://git.sr.ht/~motiejus/bazel-zig-cc -b hangzig
$ cd bazel-zig-cc
$ for i in $(seq 1000); do date; echo $i; time ./hangzig; done

It fails more often in builds.sr.ht (therefore the test script has --cpuset-cpus=0-1, because builds.sr.ht allocates 2 CPUs), e.g. https://builds.sr.ht/~motiejus/job/526372. On my laptop it failed on the 15'th iteration, an iteration is ~90 seconds.

zig version: 0.9.0-dev.137+86ebd4b97. I know bazel in the loop is cumbersome, but I wasn't able to find an easy way to reproduce it without it.

@motiejus motiejus changed the title zig c++ hanging when compiling golang for macos zig c++ hanging when invoked in parallel Jun 17, 2021
@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Jun 17, 2021
@andrewrk andrewrk added this to the 0.8.1 milestone Jun 17, 2021
@motiejus
Copy link
Contributor Author

Like discussed on IRC, I ran this overnight with a debug build of zig; it keeps running successfully, 266 iterations, and counting.

@motiejus
Copy link
Contributor Author

motiejus commented Jun 18, 2021

Apparently it needs a x86_64-linux-musl build of zig to manifest. Bingo!

Process 1

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x000000000781bbae in sccp ()
(gdb) thread apply all bt

Thread 3 (LWP 2874014 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=10, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7ff8d0d6f7b0 "18c5d15d382debc4dd4079a12f75e0a1.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7ff8d0d727c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7ff8d1d876b8, c_object=0x7ff8d1d77490, c_obj_prog_node=0x7ffc29381080) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7ff8d1d876b8, c_object=0x7ff8d1d77490, progress_node=0x7ffc29381080, wg=0x7ff8d1d87b40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7ff8d1d77d40) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7ff8d1d83408) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7ff8d1dd9d60) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 2874013 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=12, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7ff8d1d717b0 "67d6c9ffb54c8f0dcc6e5c5e272f66c2.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7ff8d1d747c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7ff8d1d876b8, c_object=0x7ff8d1d773f0, c_obj_prog_node=0x7ffc29381080) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7ff8d1d876b8, c_object=0x7ff8d1d773f0, progress_node=0x7ffc29381080, wg=0x7ff8d1d87b40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7ff8d1d77cc0) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7ff8d1d833d0) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7ff8d1dd9d50) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 1 (LWP 2874009 "zig"):
#0  0x000000000781bbae in sccp ()
#1  0x000000000781c65d in __timedwait_cp ()
#2  0x0000000000000038 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) 

Process 2

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x000000000781bbae in sccp ()
(gdb) thread apply all bt

Thread 3 (LWP 2874012 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=11, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7f7cece577b0 "aeee3b9656aecd6cbe1df612eb49381b.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7f7cece5a7c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7f7cede6f6b8, c_object=0x7f7cede5f440, c_obj_prog_node=0x7ffcd64a1380) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7f7cede6f6b8, c_object=0x7f7cede5f440, progress_node=0x7ffcd64a1380, wg=0x7f7cede6fb40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7f7cede5fd00) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7f7cede6b408) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7f7cedec1d60) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 2874011 "zig"):
#0  0x000000000781bbae in sccp ()
#1  0x000000000781c65d in __timedwait_cp ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 2874010 "zig"):
#0  0x000000000781bbae in sccp ()
#1  0x000000000781c65d in __timedwait_cp ()
#2  0x0000000000000038 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) 

Not sure how significant the warning is: I cannot run strace in the container, because it doesn't have ptrace permissions (I forgot to add --privileged).

Zig version

0.8.0-194-gb9e78593b, compiled from https://github.com/ziglang/zig-bootstrap with these changes:

  • manually updated zig source tree (rm -fr zig; cp -r ../zig .)
  • patched build, so it emits a Debug build instead of Release build.
  • Build command: ./build -j6 x86_64-linux-musl native

@LemonBoy
Copy link
Contributor

Interesting, are you able to get the output of lslocks inside the container?
It's possible that Cache.hit hits some error and never closes the locked file self.manifest_file, I'm not that familiar with the cache code to tell if that could be a problem.

@motiejus
Copy link
Contributor Author

motiejus commented Jun 18, 2021

Interesting, are you able to get the output of lslocks inside the container?

I have stopped the original and am trying to capture it again with --privileged (so I can ptrace inside the container). If/when that happens again, I will re-report the backtraces and output of lslocks.

@mikdusan
Copy link
Member

perhaps the conditional is our happy little bug?

zig/src/main.zig

Line 2378 in 1f29b75

defer if (enable_cache) man.deinit();

@motiejus
Copy link
Contributor Author

motiejus commented Jun 18, 2021

A new set of borked processes, with ptrace privileges:

ps auxf

root@6cf98e48ebb0:/x# ps auxf | cat
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        5608  0.0  0.0   4088  3300 pts/1    Ss   15:15   0:00 bash
root        6167  0.0  0.0   6696  2972 pts/1    R+   15:15   0:00  \_ ps auxf
root        6168  0.0  0.0   2464   516 pts/1    S+   15:15   0:00  \_ cat
root           1  0.0  0.0 709980  9060 pts/0    Ssl+ 12:26   0:00 ./bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2421  0.0  0.0 343876  4896 pts/0    Sl+  12:26   0:03 /root/.cache/bazelisk/downloads/bazelbuild/bazel-4.1.0-linux-x86_64/bin/bazel build -s --platforms @zig_sdk//:aarch64-macos-gnu //test:gognu
root        2426  0.7  0.1 17997740 471468 ?     Ssl  12:26   1:17 bazel(x) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/jli:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/embedded_tools/jdk/lib/server:/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/ -Dfile.encoding=ISO-8859-1 -jar /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 --output_user_root=/root/.cache/bazel/_bazel_root --install_base=/root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220 --install_md5=f95ca91ebc34d56aa0f8ad499de91220 --output_base=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9 --workspace_directory=/x --default_system_javabase= --failure_detail_out=/root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/failure_detail.rawproto --expand_configs_in_place --idle_server_tasks --write_command_log --nowatchfs --nofatal_event_bus_exceptions --nowindows_enable_symlinks --client_debug=false --product_name=Bazel --noincompatible_enable_execution_transition --option_sources=
root        3006  0.0  0.0   5808  3280 ?        S    12:27   0:00  \_ /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/linux-sandbox -t 15 -w /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc -w /tmp -w /dev/shm -- bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        3007  0.0  0.0   6836  1244 ?        S    12:27   0:00      \_ /root/.cache/bazel/_bazel_root/install/f95ca91ebc34d56aa0f8ad499de91220/linux-sandbox -t 15 -w /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc -w /tmp -w /dev/shm -- bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        3008  0.0  0.0 707672  3408 ?        Sl   12:27   0:01          \_ bazel-out/host/bin/external/go_sdk/builder stdlib -sdk external/go_sdk -installsuffix darwin_arm64 -out bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_
root        3013  0.0  0.0 1159680 14200 ?       Sl   12:27   0:02              \_ external/go_sdk/bin/go install -toolexec /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc/bazel-out/host/bin/external/go_sdk/builder filterbuildid -gcflags=all= -ldflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc -asmflags=all=-trimpath /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_ std runtime/cgo
root        5598  0.0  0.0 162336 26764 ?        Sl   12:28   0:00                  \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/os/user -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2367661082/b036=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2367661082/b036/_cgo_.o /tmp/go-build2367661082/b036/_cgo_main.o /tmp/go-build2367661082/b036/_x001.o /tmp/go-build2367661082/b036/_x002.o /tmp/go-build2367661082/b036/_x003.o /tmp/go-build2367661082/b036/_x004.o -target aarch64-macos-gnu
root        5605  0.0  0.0 162316 26728 ?        Sl   12:28   0:00                  \_ /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig c++ -I /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc/bazel-out/k8-fastbuild/bin/external/io_bazel_rules_go/stdlib_/src/net -fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2367661082/b087=/tmp/go-build -gno-record-gcc-switches -fno-common -o /tmp/go-build2367661082/b087/_cgo_.o /tmp/go-build2367661082/b087/_cgo_main.o /tmp/go-build2367661082/b087/_x001.o /tmp/go-build2367661082/b087/_x002.o /tmp/go-build2367661082/b087/_x003.o /tmp/go-build2367661082/b087/_x004.o /tmp/go-build2367661082/b087/_x005.o -target aarch64-macos-gnu
root@6cf98e48ebb0:/x# 

lslocks --output-all | grep zig

root@6cf98e48ebb0:/x# lslocks --output-all | grep -E 'BLOCKER|zig'
COMMAND           PID   TYPE SIZE MODE   M START END PATH                          BLOCKER
zig              5605  FLOCK      WRITE* 0     0   0                                  5598
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5605  FLOCK      WRITE* 0     0   0                                  5598
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE* 0     0   0                                  5605
zig              5605  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE* 0     0   0                                  5605
zig              5605  FLOCK      WRITE  0     0   0                               
zig              5598  FLOCK      WRITE  0     0   0                               
root@6cf98e48ebb0:/x# 

gdb process 1

root@6cf98e48ebb0:/x# gdb -p 5598
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 5598
[New LWP 5599]
[New LWP 5600]
thread apply 
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x000000000781bbae in sccp ()
(gdb) thread apply all bt

Thread 3 (LWP 5600 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=49, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7f9b132cd7b0 "e6952993d9b2971d4f720b163e9e672c.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7f9b132d07c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7f9b142e56b8, c_object=0x7f9b14301170, c_obj_prog_node=0x7ffdde1d6c90) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7f9b142e56b8, c_object=0x7f9b14301170, progress_node=0x7ffdde1d6c90, wg=0x7f9b142e5b40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7f9b142e6b00) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7f9b142e1408) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7f9b14337d60) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 5599 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=51, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7f9b142cf7b0 "f322b09235a58afcec5f96313ce20bb3.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7f9b142d27c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7f9b142e56b8, c_object=0x7f9b142d47f0, c_obj_prog_node=0x7ffdde1d6c90) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7f9b142e56b8, c_object=0x7f9b142d47f0, progress_node=0x7ffdde1d6c90, wg=0x7f9b142e5b40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7f9b142e6b40) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7f9b142e13d0) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7f9b14337d50) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 1 (LWP 5598 "zig"):
#0  0x000000000781bbae in sccp ()
#1  0x000000000781c65d in __timedwait_cp ()
#2  0x0000000000000038 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) 

gdb process 2

root@6cf98e48ebb0:/x# gdb -p 5605
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 5605
[New LWP 5606]
[New LWP 5607]

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
0x000000000781bbae in sccp ()
(gdb) thread apply all bt

Thread 3 (LWP 5607 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=13, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7fa6e4eb77b0 "aeee3b9656aecd6cbe1df612eb49381b.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7fa6e4eba7c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7fa6e5ecf6b8, c_object=0x7fa6e5ebf440, c_obj_prog_node=0x7ffd767af6d0) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7fa6e5ecf6b8, c_object=0x7fa6e5ebf440, progress_node=0x7ffd767af6d0, wg=0x7fa6e5ecfb40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7fa6e5ebfcc0) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7fa6e5ecb408) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7fa6e5f21d60) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 5606 "zig"):
#0  0x00000000078093b3 in flock ()
#1  0x0000000002da4d4f in std.os.flock (fd=12, operation=2) at /home/motiejus/code/zig-bootstrap/zig/lib/std/os.zig:4373
#2  0x0000000002b503f5 in std.fs.Dir.createFileZ (self=..., sub_path_c=0x7fa6e5eb97b0 "18c5d15d382debc4dd4079a12f75e0a1.txt", flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:983
#3  0x0000000002b4bbcb in std.fs.Dir.createFile (self=..., sub_path=..., flags=...) at /home/motiejus/code/zig-bootstrap/zig/lib/std/fs.zig:914
#4  0x0000000002b4715d in Cache.Manifest.hit (self=0x7fa6e5ebc7c0) at /home/motiejus/code/zig-bootstrap/zig/src/Cache.zig:289
#5  0x0000000002df68ab in Compilation.updateCObject (comp=0x7fa6e5ecf6b8, c_object=0x7fa6e5ebf490, c_obj_prog_node=0x7ffd767af6d0) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2622
#6  0x0000000002df6189 in Compilation.workerUpdateCObject (comp=0x7fa6e5ecf6b8, c_object=0x7fa6e5ebf490, progress_node=0x7ffd767af6d0, wg=0x7fa6e5ecfb40) at /home/motiejus/code/zig-bootstrap/zig/src/Compilation.zig:2494
#7  0x0000000002df8aa9 in ThreadPool.Closure.runFn (runnable=0x7fa6e5ebfd00) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:117
#8  0x0000000002be3338 in ThreadPool.Worker.run (worker=0x7fa6e5ecb3d0) at /home/motiejus/code/zig-bootstrap/zig/src/ThreadPool.zig:35
#9  0x0000000002be36e8 in std.Thread.MainFuncs.posixThreadMain (ctx=0x7fa6e5f21d50) at /home/motiejus/code/zig-bootstrap/zig/lib/std/Thread.zig:299
#10 0x000000000781d09c in start ()
#11 0x0000000000000000 in ?? ()

Thread 1 (LWP 5605 "zig"):
#0  0x000000000781bbae in sccp ()
#1  0x000000000781c65d in __timedwait_cp ()
#2  0x0000000000000038 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) 

Not sure why the namespace warning is still there; I am running gdb inside the container.

This time I'm leaving these proceses to hang for further investigation. Let me know if there is something more I can try.

@motiejus
Copy link
Contributor Author

lsof 5598 ("process 1")

root@6cf98e48ebb0:/x# lsof -p 5598
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF      NODE NAME
zig     5598 root  cwd    DIR 0,1501      4096  10490372 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc
zig     5598 root  rtd    DIR 0,1501      4096   9830582 /
zig     5598 root  txt    REG 0,1501 184877896  10226850 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig
zig     5598 root  mem    REG  66,16            10226850 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig (path dev=0,1501)
zig     5598 root    0r   CHR    1,3       0t0 233014089 /dev/null
zig     5598 root    1w  FIFO   0,12       0t0 233220439 pipe
zig     5598 root    2w  FIFO   0,12       0t0 233220439 pipe
zig     5598 root    3u   DIR 0,1501      4096  10096918 /tmp/bazel-zig-cc
zig     5598 root    4u   DIR 0,1501      4096  10356793 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/lib
zig     5598 root    5u   DIR 0,1501      4096  10096918 /tmp/bazel-zig-cc
zig     5598 root    6u   DIR 0,1501     12288  10096931 /tmp/bazel-zig-cc/h
zig     5598 root    7u   DIR 0,1501      4096  14943861 /tmp/bazel-zig-cc/o/29fc05bd1a588c65c9f3f1914783feb9
zig     5598 root    8u   DIR 0,1501     12288  10096931 /tmp/bazel-zig-cc/h
zig     5598 root    9u   DIR 0,1501      4096  10227950 /tmp/bazel-zig-cc/o/fa537869cf0ef12d12daaf8210c67a8c
zig     5598 root   10uW  REG 0,1501     32000  10097112 /tmp/bazel-zig-cc/h/18c5d15d382debc4dd4079a12f75e0a1.txt
zig     5598 root   11uW  REG 0,1501     31809  10097113 /tmp/bazel-zig-cc/h/aeee3b9656aecd6cbe1df612eb49381b.txt
zig     5598 root   12uW  REG 0,1501     32181  10098246 /tmp/bazel-zig-cc/h/67d6c9ffb54c8f0dcc6e5c5e272f66c2.txt
zig     5598 root   13uW  REG 0,1501     21998  10098249 /tmp/bazel-zig-cc/h/e146b019f6b7823f94869aac599b995d.txt
zig     5598 root   14uW  REG 0,1501     21636  10098257 /tmp/bazel-zig-cc/h/22732936d0435aa1d3cbd9558004fa7d.txt
zig     5598 root   15uW  REG 0,1501     38447  10098264 /tmp/bazel-zig-cc/h/0ac2b6ea4037d239f5ce1a0c87c9703e.txt
zig     5598 root   16uW  REG 0,1501     33804  10098298 /tmp/bazel-zig-cc/h/956213cc51abd27b2f58fc8720fd063e.txt
zig     5598 root   17uW  REG 0,1501       382  10098273 /tmp/bazel-zig-cc/h/c558d923cae2aae4c6f1852e1a83c096.txt
zig     5598 root   18uW  REG 0,1501     44077  10096891 /tmp/bazel-zig-cc/h/fe59a1a1c778ef0373e24fb30a3d5ceb.txt
zig     5598 root   19uW  REG 0,1501     34144  10098284 /tmp/bazel-zig-cc/h/724facfa4300748904328995b709c629.txt
zig     5598 root   20uW  REG 0,1501     39138  10098289 /tmp/bazel-zig-cc/h/3ff7ca3b88a1b3f80b588d39dd1ec269.txt
zig     5598 root   21uW  REG 0,1501     33980  10098325 /tmp/bazel-zig-cc/h/e4ea22f9c335f32574fc04246bbe0b2a.txt
zig     5598 root   22uW  REG 0,1501     40012  10098359 /tmp/bazel-zig-cc/h/86b2178f4465f96d6804349130089c77.txt
zig     5598 root   23uW  REG 0,1501     37536  10098327 /tmp/bazel-zig-cc/h/7c500680b3fa8baa8c7fff2324e41a8b.txt
zig     5598 root   24uW  REG 0,1501     35412  10098335 /tmp/bazel-zig-cc/h/5f2de41003fe76f7c5d9aed9b003b0ed.txt
zig     5598 root   25uW  REG 0,1501     31303  10098378 /tmp/bazel-zig-cc/h/5e34d479e7112a43571191f26c6cab2b.txt
zig     5598 root   26uW  REG 0,1501     31304  10098357 /tmp/bazel-zig-cc/h/5d6e930b20979aae2b34f39ded0f1d9c.txt
zig     5598 root   27uW  REG 0,1501     34147  10098391 /tmp/bazel-zig-cc/h/350e15054b17e84fd8aba996819d8945.txt
zig     5598 root   28uW  REG 0,1501     21079  10098382 /tmp/bazel-zig-cc/h/33ed3825f1fdd283eef9e25aef6bd431.txt
zig     5598 root   29uW  REG 0,1501     44078  10098430 /tmp/bazel-zig-cc/h/ba32123a89a019dd21577926a30c54df.txt
zig     5598 root   30uW  REG 0,1501     34324  10098400 /tmp/bazel-zig-cc/h/3b2245b507ac568035dd34c555b16f56.txt
zig     5598 root   31uW  REG 0,1501     44839  10098422 /tmp/bazel-zig-cc/h/7b342b1bf9e933782f20406c3e26f805.txt
zig     5598 root   32uW  REG 0,1501     45156  10098442 /tmp/bazel-zig-cc/h/58703fc6d766dc8332d29c225e903924.txt
zig     5598 root   33uW  REG 0,1501     36999  10098438 /tmp/bazel-zig-cc/h/df7ef0f36afe0c51eaa57367222c2eff.txt
zig     5598 root   34uW  REG 0,1501     49280  10098487 /tmp/bazel-zig-cc/h/1fe48363f68a9920b2d645fe94e5059c.txt
zig     5598 root   35uW  REG 0,1501     32179  10098459 /tmp/bazel-zig-cc/h/2b501604209ee6b83f861f4f2d349c7c.txt
zig     5598 root   36uW  REG 0,1501     34311  10098468 /tmp/bazel-zig-cc/h/30b5685079aa81e90a74e305938525af.txt
zig     5598 root   37uW  REG 0,1501     30940  10098477 /tmp/bazel-zig-cc/h/2b93caa735da77046cd278809938f920.txt
zig     5598 root   38uW  REG 0,1501      2864  10098496 /tmp/bazel-zig-cc/h/e644e8d016106d2bf5bfa9ff9f05ae09.txt
zig     5598 root   39uW  REG 0,1501     21105  10098544 /tmp/bazel-zig-cc/h/88a57927245d2d9e2454d75530193d00.txt
zig     5598 root   40uW  REG 0,1501     49310  10098504 /tmp/bazel-zig-cc/h/74c9940e5ceab29cd40c2ceba73d5c1b.txt
zig     5598 root   41uW  REG 0,1501     31561  10098518 /tmp/bazel-zig-cc/h/7893f9324e214c2aebd5f4d3585bb5af.txt
zig     5598 root   42uW  REG 0,1501     22587  10098530 /tmp/bazel-zig-cc/h/76d081df52185daee610b741ca96cb72.txt
zig     5598 root   43uW  REG 0,1501     34683  10098536 /tmp/bazel-zig-cc/h/49ecae609e51b2d4b6d170cdd8a7d429.txt
zig     5598 root   44uW  REG 0,1501     34172  10098551 /tmp/bazel-zig-cc/h/99b7c72521c5fbbe31d50d4ee7bf7eb8.txt
zig     5598 root   45uW  REG 0,1501     49292  10098567 /tmp/bazel-zig-cc/h/3629e745f59ad2f18c43a557ffb09fcc.txt
zig     5598 root   46uW  REG 0,1501     23624  10098573 /tmp/bazel-zig-cc/h/e7d557bfb26e7c5dec5fca9facff6fa7.txt
zig     5598 root   47uW  REG 0,1501     30928  10098584 /tmp/bazel-zig-cc/h/2d100d6a538b5a3ae3f0f21100a7eb45.txt
zig     5598 root   48uW  REG 0,1501     34140  10098593 /tmp/bazel-zig-cc/h/72237618865593f06209d00c90d67d1c.txt
zig     5598 root   49u   REG 0,1501     31465  10098623 /tmp/bazel-zig-cc/h/e6952993d9b2971d4f720b163e9e672c.txt
zig     5598 root   50uW  REG 0,1501     30934  10098599 /tmp/bazel-zig-cc/h/b10937b28bd9977b449745c634f47c5d.txt
zig     5598 root   51u   REG 0,1501     31107  10098615 /tmp/bazel-zig-cc/h/f322b09235a58afcec5f96313ce20bb3.txt
root@6cf98e48ebb0:/x# 

lsof 5605 ("process 2")

root@6cf98e48ebb0:/x# lsof -p 5605
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF      NODE NAME
zig     5605 root  cwd    DIR 0,1501      4096  10490372 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/sandbox/linux-sandbox/3/execroot/bazel-zig-cc
zig     5605 root  rtd    DIR 0,1501      4096   9830582 /
zig     5605 root  txt    REG 0,1501 184877896  10226850 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig
zig     5605 root  mem    REG  66,16            10226850 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/zig (path dev=0,1501)
zig     5605 root    0r   CHR    1,3       0t0 233014089 /dev/null
zig     5605 root    1w  FIFO   0,12       0t0 233212540 pipe
zig     5605 root    2w  FIFO   0,12       0t0 233212540 pipe
zig     5605 root    3u   DIR 0,1501      4096  10096918 /tmp/bazel-zig-cc
zig     5605 root    4u   DIR 0,1501      4096  10356793 /root/.cache/bazel/_bazel_root/cc8755609ad61864910f145119713de9/external/zig_sdk/lib
zig     5605 root    5u   DIR 0,1501      4096  10096918 /tmp/bazel-zig-cc
zig     5605 root    6u   DIR 0,1501     12288  10096931 /tmp/bazel-zig-cc/h
zig     5605 root    7u   DIR 0,1501      4096  14943976 /tmp/bazel-zig-cc/o/88e75e6c22007217207558dfa5cd8f06
zig     5605 root    8u   DIR 0,1501     12288  10096931 /tmp/bazel-zig-cc/h
zig     5605 root    9u   DIR 0,1501      4096  10227950 /tmp/bazel-zig-cc/o/fa537869cf0ef12d12daaf8210c67a8c
zig     5605 root   10uW  REG 0,1501     31107  10098615 /tmp/bazel-zig-cc/h/f322b09235a58afcec5f96313ce20bb3.txt
zig     5605 root   11uW  REG 0,1501     31465  10098623 /tmp/bazel-zig-cc/h/e6952993d9b2971d4f720b163e9e672c.txt
zig     5605 root   12u   REG 0,1501     32000  10097112 /tmp/bazel-zig-cc/h/18c5d15d382debc4dd4079a12f75e0a1.txt
zig     5605 root   13u   REG 0,1501     31809  10097113 /tmp/bazel-zig-cc/h/aeee3b9656aecd6cbe1df612eb49381b.txt
root@6cf98e48ebb0:/x# 

@motiejus
Copy link
Contributor Author

For the record, I still have the (quite powerful) VM hanging around with the stuck processes, if you need any more info.

Once there is a patch, I am happy to re-run the tests with the patch.

@motiejus
Copy link
Contributor Author

@andrewrk
Copy link
Member

18c5d15d382debc4dd4079a12f75e0a1.txt

File being compiled: lib/libcxx/src/vector.cpp

aeee3b9656aecd6cbe1df612eb49381b.txt

File being compiled: lib/libcxx/src/variant.cpp

e6952993d9b2971d4f720b163e9e672c.txt

File being compiled: lib/libcxx/src/algorithm.cpp

f322b09235a58afcec5f96313ce20bb3.txt

File being compiled: lib/libcxx/src/any.cpp

@andrewrk
Copy link
Member

Process 1 locks vector.o & compiles it. Process 2 locks variant.o & compiles it. Process 1 tries to grab lock on variant.o and waits. Process 2 tries to grab lock on vector.o and waits. Deadlock.

I think the problem is clear, this seems to be a design flaw. I'm looking into making proper use of shared locks to solve the problem.

@andrewrk
Copy link
Member

I believe #7596 will solve this.

@mikdusan
Copy link
Member

mikdusan commented Jun 26, 2021

yup the design flaw is verified; reviewing the gdb and lsof outputs we have this:

  • proc-1 waiting to lock e6952993d9b2971d4f720b163e9e672c.txt and proc-2 has it locked-exclusive
  • proc-1 waiting to lock f322b09235a58afcec5f96313ce20bb3.txt and proc-2 has it locked-exclusive
  • proc-2 waiting to lock aeee3b9656aecd6cbe1df612eb49381b.txt and proc-1 has it locked-exclusive
  • proc-2 waiting to lock 18c5d15d382debc4dd4079a12f75e0a1.txt and proc-1 has it locked-exclusive

edit: I also get confirmation on 2 deadlocks processes on my own system

andrewrk added a commit that referenced this issue Jun 28, 2021
@andrewrk
Copy link
Member

I've implemented a fix to this here: #9258
However some work is needed to improve the std lib support for file locking on Windows before it can be merged.

andrewrk added a commit that referenced this issue Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants