You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A-processArea: `std::process` and `std::env`C-bugCategory: This is a bug.O-unixOperating system: Unix-likeT-libsRelevant to the library team, which will review and decide on the PR/issue.
The n2 project is a reimplementation of the ninja build system. As such, it launches many subprocesses. For every subprocess, it spawns a thread that runs std::process::Command::spawn().
On macOS, ninja -j250 runs fine, while n2 -j250 runs out of file descriptors (n2 bug report: evmar/n2#14). It looks like this is due to an FD leak in rust's standard library.
Command::spawn() in the rust stdlib unconditionally calls anon_pipe here:
This means there's a window where the pipe is created but cloexec isn't set on the pipe's FDs yet. If a different thread forks in that window, the pipe's fds get leaked.
The FD leak went away when putting std::process::Command::spawn() behind a mutex, so it does seem like this race is in fact the cause.
On the n2 issue, @bnoordhuis remarks "Just throwing it out there: libstd uses posix_spawn() under the right conditions (instead of fork + execve) and macOS has a POSIX_SPAWN_CLOEXEC_DEFAULT attribute that does what its name suggests. Teaching libstd about it probably isn't too hard." This might be a possible venue for a fix on macOS, but it's possible to imagine a program that depends on some FDs staying open, and I don't know if there's a way to make POSIX_SPAWN_CLOEXEC_DEFAULT apply only to the 2 fds returned by pipe().
The text was updated successfully, but these errors were encountered:
Are you sure it's that one? This is right after the attempt to use posix_spawn, only for when we need the manual fork/exec. So if that's your leaking pipe, POSIX_SPAWN_CLOEXEC_DEFAULT would already be out of the picture.
The other place that calls anon_pipe is Stdio::to_child_stdio() for the MakePipe variant:
That's created by Stdio::piped() passed to one of the Command handles, or by Command::output(). It looks like n2 does use Stdio::piped(), now guarded by its TASK_MUTEX.lock().
A-processArea: `std::process` and `std::env`C-bugCategory: This is a bug.O-unixOperating system: Unix-likeT-libsRelevant to the library team, which will review and decide on the PR/issue.
The n2 project is a reimplementation of the ninja build system. As such, it launches many subprocesses. For every subprocess, it spawns a thread that runs
std::process::Command::spawn()
.On macOS,
ninja -j250
runs fine, whilen2 -j250
runs out of file descriptors (n2 bug report: evmar/n2#14). It looks like this is due to an FD leak in rust's standard library.Command::spawn() in the rust stdlib unconditionally calls anon_pipe here:
rust/library/std/src/sys/unix/process/process_unix.rs
Line 59 in 5217347
anon_pipe on Linux calls pipe2 to set CLOEXEC on the pipe atomically:
rust/library/std/src/sys/unix/pipe.rs
Line 18 in 5217347
But macOS has no pipe2, so here the stdlib instead calls pipe() followed by set_cloexec:
rust/library/std/src/sys/unix/pipe.rs
Line 35 in 5217347
This means there's a window where the pipe is created but cloexec isn't set on the pipe's FDs yet. If a different thread forks in that window, the pipe's fds get leaked.
The FD leak went away when putting
std::process::Command::spawn()
behind a mutex, so it does seem like this race is in fact the cause.On the n2 issue, @bnoordhuis remarks "Just throwing it out there: libstd uses posix_spawn() under the right conditions (instead of fork + execve) and macOS has a POSIX_SPAWN_CLOEXEC_DEFAULT attribute that does what its name suggests. Teaching libstd about it probably isn't too hard." This might be a possible venue for a fix on macOS, but it's possible to imagine a program that depends on some FDs staying open, and I don't know if there's a way to make POSIX_SPAWN_CLOEXEC_DEFAULT apply only to the 2 fds returned by pipe().
The text was updated successfully, but these errors were encountered: