-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add CommandExt::{exec, before_exec} #1359
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
- Feature Name: `process_exec` | ||
- Start Date: 2015-11-09 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Add two methods to the `std::os::unix::process::CommandExt` trait to provide | ||
more control over how processes are spawned on Unix, specifically: | ||
|
||
```rust | ||
fn exec(&mut self) -> io::Error; | ||
fn before_exec<F>(&mut self, f: F) -> &mut Self | ||
where F: FnOnce() -> io::Result<()> + Send + Sync + 'static; | ||
``` | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Although the standard library's implementation of spawning processes on Unix is | ||
relatively complex, it unfortunately doesn't provide the same flexibility as | ||
calling `fork` and `exec` manually. For example, these sorts of use cases are | ||
not possible with the `Command` API: | ||
|
||
* The `exec` function cannot be called without `fork`. It's often useful on Unix | ||
in doing this to avoid spawning processes or improve debuggability if the | ||
pre-`exec` code was some form of shim. | ||
* Execute other flavorful functions between the fork/exec if necessary. For | ||
example some proposed extensions to the standard library are [dealing with the | ||
controlling tty][tty] or dealing with [session leaders][session]. In theory | ||
any sort of arbitrary code can be run between these two syscalls, and it may | ||
not always be the case the standard library can provide a suitable | ||
abstraction. | ||
|
||
[tty]: https://github.com/rust-lang/rust/pull/28982 | ||
[session]: https://github.com/rust-lang/rust/pull/26470 | ||
|
||
Note that neither of these pieces of functionality are possible on Windows as | ||
there is no equivalent of the `fork` or `exec` syscalls in the standard APIs, so | ||
these are specifically proposed as methods on the Unix extension trait. | ||
|
||
# Detailed design | ||
[design]: #detailed-design | ||
|
||
The following two methods will be added to the | ||
`std::os::unix::process::CommandExt` trait: | ||
|
||
```rust | ||
/// Performs all the required setup by this `Command`, followed by calling the | ||
/// `execvp` syscall. | ||
/// | ||
/// On success this function will not return, and otherwise it will return an | ||
/// error indicating why the exec (or another part of the setup of the | ||
/// `Command`) failed. | ||
/// | ||
/// Note that the process may be in a "broken state" if this function returns in | ||
/// error. For example the working directory, environment variables, signal | ||
/// handling settings, various user/group information, or aspects of stdio | ||
/// file descriptors may have changed. If a "transactional spawn" is required to | ||
/// gracefully handle errors it is recommended to use the cross-platform `spawn` | ||
/// instead. | ||
fn exec(&mut self) -> io::Error; | ||
|
||
/// Schedules a closure to be run just before the `exec` function is invoked. | ||
/// | ||
/// This closure will be run in the context of the child process after the | ||
/// `fork` and other aspects such as the stdio file descriptors and working | ||
/// directory have successfully been changed. Note that this is often a very | ||
/// constrained environment where normal operations like `malloc` or acquiring a | ||
/// mutex are not guaranteed to work (due to other threads perhaps still running | ||
/// when the `fork` was run). | ||
/// | ||
/// The closure is allowed to return an I/O error whose OS error code will be | ||
/// communicated back to the parent and returned as an error from when the spawn | ||
/// was requested. | ||
/// | ||
/// Multiple closures can be registered and they will be called in order of | ||
/// their registration. If a closure returns `Err` then no further closures will | ||
/// be called and the spawn operation will immediately return with a failure. | ||
fn before_exec<F>(&mut self, f: F) -> &mut Self | ||
where F: FnOnce() -> io::Result<()> + Send + Sync + 'static; | ||
``` | ||
|
||
The `exec` function is relatively straightforward as basically the entire spawn | ||
operation minus the `fork`. The stdio handles will be inherited by default if | ||
not otherwise configured. Note that a configuration of `piped` will likely just | ||
end up with a broken half of a pipe on one of the file descriptors. | ||
|
||
The `before_exec` function has extra-restrictive bounds to preserve the same | ||
qualities that the `Command` type has (notably `Send`, `Sync`, and `'static`). | ||
This also happens after all other configuration has happened to ensure that | ||
libraries can take advantage of the other operations on `Command` without having | ||
to reimplement them manually in some circumstances. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This change is possible to be a breaking change to `Command` as it will no | ||
longer implement all marker traits by default (due to it containing closure | ||
trait objects). While the common marker traits are handled here, it's possible | ||
that there are some traits in the wild in use which this could break. | ||
|
||
Much of the functionality which may initially get funneled through `before_exec` | ||
may actually be best implemented as functions in the standard library itself. | ||
It's likely that many operations are well known across unixes and aren't niche | ||
enough to stay outside the standard library. | ||
|
||
# Alternatives | ||
[alternatives]: #alternatives | ||
|
||
Instead of souping up `Command` the type could instead provide accessors to all | ||
of the configuration that it contains. This would enable this sort of | ||
functionality to be built on crates.io first instead of requiring it to be built | ||
into the standard library to start out with. Note that this may want to end up | ||
in the standard library regardless, however. | ||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
* Is it appropriate to run callbacks just before the `exec`? Should they instead | ||
be run before any standard configuration like stdio has run? | ||
* Is it possible to provide "transactional semantics" to the `exec` function | ||
such that it is safe to recover from? Perhaps it's worthwhile to provide | ||
partial transactional semantics in the form of "this can be recovered from so | ||
long as all stdio is inherited". |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this might want to be unsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! I would be ok marking it as such, but I was unable to come up with a concrete reason as to why, in which case I left it as safe.
Do you have an unsafe use-case in mind though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Malloc is in an unspecified, but possibly unusable state" seems like a reasonable one? :P
More generally, it seems like it'll be possible to end up in a state where all of your other threads disappearing a the "wrong" time breaks invariants that some type's safety depends on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I may want to tone down the wording there, but the only known problem to me at least is that you can easily deadlock. Concretely I don't know how to get into a memory unsafe situation, and I'd reason along the lines of if you witness any other memory state of any other thread it's as if it was a concurrent situation anyway, you just happen to always witness the same situation and no progress is made.
I can imagine this causing weird bugs, but not memory safety ones at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
exec
andbefore_exec
should be unsafe and the inline doc should warn more the user why it's important to not use the heap at all to avoid inconsistent memory layout (e.g. when malloc/free is interrupted). Cf. reentrant closure.It would be nice to have a lint to check unsafe code here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with what @sfackler was saying, though, I don't think these should be unsafe unless they actually lead to memory unsafety. The only problem I know with heap allocation is that it deadlocks, which is not unsafe. I'm also not quite sure what this has to do with reentrancy as it's not as dangerous as, for example, a signal handler.
Could you clarify the situations in which you think running safe arbitrary code here could lead to memory unsafety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depend on the other threads of the process. They can deadlock but some code could probably corrupt your data as well. This article worth a read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this is certainly only a problem if there are other threads in play here, but to reiterate, deadlock is not related to memory safety. Corrupting data certainly is, but I'm just trying to get a handle on how that's possible.
For example, if you clone the address space with a locked mutex, then I'd assume that any future attempts to lock that mutex in the child would simply deadlock. This does not corrupt any data. Similarly if you call a function like
malloc
it was either locked or unlocked in the parent process, and in both cases data corruption wouldn't be possible.The article you linked mentions "Moreover: state of mutexes is undefined" but I don't think that's true because the
pthread_atfork
documentation literally recommends grabbing mutexes before a fork and releasing them afterwards. Taking a look again at thefork
documentation it indicates:It does not explain why, "async-signal-safe" is the only method to be safe after a fork. For example
malloc
is not async-signal-safe but to the best of my knowledge basically all libraries usepthread_atfork
to make it safe to call after the fork anyway. This may mean it's just not meant to work everywhere, however.I'm uncomfortable saying this is unsafe "just because standard X says so" without concretely understanding why, but it's probably good enough for now.