Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add CommandExt::{exec, before_exec} #1359

Merged
merged 1 commit into from
Feb 4, 2016

Conversation

alexcrichton
Copy link
Member

Add two methods to the std::os::unix::process::CommandExt trait to provide
more control over how processes are spawned on Unix, specifically:

fn exec(&mut self) -> io::Error;
fn before_exec<F: FnOnce() + Send + Sync + 'static>(&mut self, f: F) -> &mut Self;

Rendered

Add two methods to the `std::os::unix::process::CommandExt` trait to provide
more control over how processes are spawned on Unix, specifically:

```rust
fn exec(&mut self) -> io::Error;
fn before_exec<F: FnOnce() + Send + Sync + 'static>(&mut self, f: F) -> &mut Self;
```

[Rendered][link]

[link]: https://github.com/alexcrichton/rfcs/blob/process-ext/text/0000-process-ext-unix.md
@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Nov 10, 2015
@alexcrichton alexcrichton self-assigned this Nov 10, 2015
/// Multiple closures can be registered and they will be called in order of
/// their registration. If a closure returns `Err` then no further closures will
/// be called and the spawn operation will immediately return with a failure.
fn before_exec<F>(&mut self, f: F) -> &mut Self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this might want to be unsafe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I would be ok marking it as such, but I was unable to come up with a concrete reason as to why, in which case I left it as safe.

Do you have an unsafe use-case in mind though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Malloc is in an unspecified, but possibly unusable state" seems like a reasonable one? :P

More generally, it seems like it'll be possible to end up in a state where all of your other threads disappearing a the "wrong" time breaks invariants that some type's safety depends on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I may want to tone down the wording there, but the only known problem to me at least is that you can easily deadlock. Concretely I don't know how to get into a memory unsafe situation, and I'd reason along the lines of if you witness any other memory state of any other thread it's as if it was a concurrent situation anyway, you just happen to always witness the same situation and no progress is made.

I can imagine this causing weird bugs, but not memory safety ones at least.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exec and before_exec should be unsafe and the inline doc should warn more the user why it's important to not use the heap at all to avoid inconsistent memory layout (e.g. when malloc/free is interrupted). Cf. reentrant closure.

It would be nice to have a lint to check unsafe code here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with what @sfackler was saying, though, I don't think these should be unsafe unless they actually lead to memory unsafety. The only problem I know with heap allocation is that it deadlocks, which is not unsafe. I'm also not quite sure what this has to do with reentrancy as it's not as dangerous as, for example, a signal handler.

Could you clarify the situations in which you think running safe arbitrary code here could lead to memory unsafety?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depend on the other threads of the process. They can deadlock but some code could probably corrupt your data as well. This article worth a read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right this is certainly only a problem if there are other threads in play here, but to reiterate, deadlock is not related to memory safety. Corrupting data certainly is, but I'm just trying to get a handle on how that's possible.

For example, if you clone the address space with a locked mutex, then I'd assume that any future attempts to lock that mutex in the child would simply deadlock. This does not corrupt any data. Similarly if you call a function like malloc it was either locked or unlocked in the parent process, and in both cases data corruption wouldn't be possible.

The article you linked mentions "Moreover: state of mutexes is undefined" but I don't think that's true because the pthread_atfork documentation literally recommends grabbing mutexes before a fork and releasing them afterwards. Taking a look again at the fork documentation it indicates:

Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

It does not explain why, "async-signal-safe" is the only method to be safe after a fork. For example malloc is not async-signal-safe but to the best of my knowledge basically all libraries use pthread_atfork to make it safe to call after the fork anyway. This may mean it's just not meant to work everywhere, however.

I'm uncomfortable saying this is unsafe "just because standard X says so" without concretely understanding why, but it's probably good enough for now.

@l0kod
Copy link

l0kod commented Nov 10, 2015

Excellent!

The before_exec could be renamed to post_exec.

@alexcrichton
Copy link
Member Author

@l0kod wouldn't that imply that the callback runs after exec though? Perhaps you meant pre_exec?

@l0kod
Copy link

l0kod commented Nov 11, 2015

@l0kod wouldn't that imply that the callback runs after exec though? Perhaps you meant pre_exec?

Yes, of course :)

@BurntSushi
Copy link
Member

Instead of souping up Command the type could instead provide accessors to all of the configuration that it contains. This would enable this sort of functionality to be built on crates.io first instead of requiring it to be built into the standard library to start out with. Note that this may want to end up in the standard library regardless, however.

This kind of sounds like an attractive alternative. I admit that I don't know much about the uses cases here (like, at all), but why not go this route instead? It seems like it might give others more flexibility?

@alexcrichton
Copy link
Member Author

🔔 This RFC is now entering its week-long final comment period 🔔

@alexcrichton
Copy link
Member Author

@BurntSushi while plausible, I'd personally prefer to push more for this in the standard library itself. While getters will be attractive in their own right, implementing an actually spawn is actually as massive pain (e.g. unix/process.rs and windows/process.rs are quite non-trivial). Certainly a plausible alternative though!

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jan 29, 2016
@alexcrichton
Copy link
Member Author

The libs team discussed this during triage today and the conclusion was to merge, so I will do so shortly!

@alexcrichton alexcrichton merged commit fa5b757 into rust-lang:master Feb 4, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 10, 2016
This commit implements the `exec` function proposed in [RFC 1359][rfc] which is
a function on the `CommandExt` trait to execute all parts of a `Command::spawn`
without the `fork` on Unix. More details on the function itself can be found in
the comments in the commit.

[rfc]: rust-lang/rfcs#1359

cc rust-lang#31398
bors added a commit to rust-lang/rust that referenced this pull request Feb 10, 2016
These commits are an implementation of rust-lang/rfcs#1359 which is tracked via #31398. The `before_exec` implementation fit easily with the current process spawning framework we have, but unfortunately the `exec` implementation required a bit of a larger refactoring. The stdio handles were all largely managed as implementation details of `std::process` and the `exec` function lived in `std::sys`, so the two didn't have access to one another.

I took this as a sign that a deeper refactoring was necessary, and I personally feel that the end result is cleaner for both Windows and Unix. The commits should be separated nicely for reviewing (or all at once if you're feeling ambitious), but the changes made here were:

* The process spawning on Unix was refactored in to a pre-exec and post-exec function. The post-exec function isn't allowed to do any allocations of any form, and management of transmitting errors back to the parent is managed by the pre-exec function (as it's the one that actually forks).
* Some management of the exit status was pushed into platform-specific modules. On Unix we must cache the return value of `wait` as the pid is consumed after we wait on it, but on Windows we can just keep querying the system because the handle stays valid.
* The `Stdio::None` variant was renamed to `Stdio::Null` to better reflect what it's doing.
* The global lock on `CreateProcess` is now correctly positioned to avoid unintended inheritance of pipe handles that other threads are sending to their child processes. After a more careful reading of the article referenced the race is not in `CreateProcess` itself, but rather the property that handles are unintentionally shared.
* All stdio management now happens in platform-specific modules. This provides a cleaner implementation/interpretation for `FromFraw{Fd,Handle}` for each platform as well as a cleaner transition from a configuration to what-to-do once we actually need to do the spawn.

With these refactorings in place, implementing `before_exec` and `exec` ended up both being pretty trivial! (each in their own commit)
@Centril Centril added A-platform Platform related proposals & ideas A-process Proposals relating to spawning and communicating with processes. A-unix Proposals relating to UNIX. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-platform Platform related proposals & ideas A-process Proposals relating to spawning and communicating with processes. A-unix Proposals relating to UNIX. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants