Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subprocess support: high-level interface #833

Closed
wants to merge 9 commits into from

Conversation

oremanj
Copy link
Member

@oremanj oremanj commented Jan 1, 2019

Closes #822.

  • Add NullStream to the generic streams interface, which acts like /dev/null
  • Improvements to Process, mostly needed or wanted by higher-level functions:
    • Move from trio.subprocess to the base trio module, so trio.subprocess is just a reexport of constants and exceptions
    • Forbid lists with shell=True and strings with shell=False on UNIX due to the likelihood of user confusion
    • Add constructor options shutdown_signal and shutdown_timeout to allow orderly shutdown of subprocesses
    • Fix support for multiple tasks calling wait() simultaneously
    • cancelled attribute to track whether a subprocess was terminated due to aclose/etc cancellation
    • failed attribute to track whether a subprocess exit failure was "our fault" (if we killed it or closed its output streams before it had exited)
    • join method to do aclose without closing the pipes
    • shutdown method to do a cancelled join, since with shutdown_signal/shutdown_timeout this might be more complex than just kill()
  • Add ProcessStream, a single HalfCloseableStream wrapper around Process
  • Add open_process, a context manager that provides a ProcessStream and can raise CalledProcessError if the process fails
  • Add our own CompletedProcess with cancelled and failed attributes and consistent naming of the command (we use command vs the stdlib args)
  • Add run_process, like subprocess.run but with better defaults
    • Instead of the previous timeout and deadline arguments, add preserve_result=True option to return the CompletedProcess even if cancelled; this seems less invasive

I don't necessarily expect to merge this wholesale, but I wanted to provide a complete and polished interface as an example input to debate about what the interface ought to be. :-) Especial points of uncertainty for me include the advisability of adding ProcessStream instead of just having open_process return a Process; whether the process exiting failure should cancel the open_process block; and whether preserve_result is in fact a less invasive API choice than the previously proposed timeout/deadline arguments.

- Add `NullStream` to the generic streams interface, which acts like `/dev/null`
- Improvements to `Process`, mostly needed or wanted by higher-level functions:
  - Move from `trio.subprocess` to the base `trio` module, so `trio.subprocess` is just a reexport of constants and exceptions
  - Forbid lists with `shell=True` and strings with `shell=False` on UNIX due to the likelihood of user confusion
  - Add constructor options `shutdown_signal` and `shutdown_timeout` to allow orderly shutdown of subprocesses
  - Fix support for multiple tasks calling `wait()` simultaneously
  - `cancelled` attribute to track whether a subprocess was terminated due to `aclose`/etc cancellation
  - `failed` attribute to track whether a subprocess exit failure was "our fault" (if we killed it or closed its output streams before it had exited)
  - `join` method to do `aclose` without closing the pipes
  - `shutdown` method to do a cancelled `join`, since with `shutdown_signal`/`shutdown_timeout` this might be more complex than just `kill()`
- Add `ProcessStream`, a single `HalfCloseableStream` wrapper around `Process`
- Add `open_process`, a context manager that provides a `ProcessStream` and can raise `CalledProcessError` if the process fails
- Add our own `CompletedProcess` with `cancelled` and `failed` attributes and consistent naming of the command (we use `command` vs the stdlib `args`)
- Add `run_process`, like `subprocess.run` but with better defaults
  - Instead of the previous `timeout` and `deadline` arguments, add `preserve_result=True` option to return the `CompletedProcess` even if cancelled; this seems less invasive
@oremanj oremanj requested a review from njsmith January 1, 2019 06:13
@codecov
Copy link

codecov bot commented Jan 1, 2019

Codecov Report

Merging #833 into master will decrease coverage by 0.8%.
The diff coverage is 97.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #833      +/-   ##
==========================================
- Coverage   99.23%   98.43%   -0.81%     
==========================================
  Files         101      101              
  Lines       12182    12646     +464     
  Branches      882      944      +62     
==========================================
+ Hits        12089    12448     +359     
- Misses         72      174     +102     
- Partials       21       24       +3
Impacted Files Coverage Δ
trio/subprocess.py 100% <ø> (ø) ⬆️
trio/_windows_pipes.py 100% <100%> (ø) ⬆️
trio/_unix_pipes.py 97.67% <100%> (-2.33%) ⬇️
trio/_subprocess.py 100% <100%> (ø) ⬆️
trio/__init__.py 100% <100%> (ø) ⬆️
trio/tests/test_highlevel_generic.py 100% <100%> (ø) ⬆️
trio/_highlevel_generic.py 86.44% <76.47%> (-13.56%) ⬇️
trio/tests/test_subprocess.py 99.03% <98.5%> (-0.97%) ⬇️
trio/_subprocess_platform/kqueue.py 0% <0%> (-100%) ⬇️
trio/_core/_io_kqueue.py 0% <0%> (-80.86%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cde5fb...65507c7. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 1, 2019

Codecov Report

Merging #833 into master will increase coverage by 0.06%.
The diff coverage is 99.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #833      +/-   ##
==========================================
+ Coverage   99.21%   99.28%   +0.06%     
==========================================
  Files         101      101              
  Lines       12221    12812     +591     
  Branches      894     1007     +113     
==========================================
+ Hits        12125    12720     +595     
+ Misses         75       71       -4     
  Partials       21       21
Impacted Files Coverage Δ
trio/subprocess.py 100% <ø> (ø)
trio/tests/test_highlevel_generic.py 100% <100%> (ø) ⬆️
trio/_windows_pipes.py 100% <100%> (ø) ⬆️
trio/_unix_pipes.py 100% <100%> (ø) ⬆️
trio/_subprocess.py 100% <100%> (ø) ⬆️
trio/_highlevel_generic.py 100% <100%> (ø) ⬆️
trio/__init__.py 100% <100%> (ø) ⬆️
trio/tests/test_subprocess.py 99.75% <99.62%> (-0.25%) ⬇️
trio/tests/test_highlevel_open_tcp_listeners.py 100% <0%> (ø) ⬆️
trio/_highlevel_ssl_helpers.py 100% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b3c043...db3be46. Read the comment docs.

def failed(self):
"""Whether this process should be considered to have failed.

If the process hasn't exited yet, this is None. Otherwise, it is False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems tricky and error-prone (having it be True, False or None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is the convention I usually see for "optional bool", and it mirrors the returncode attribute (which is None or an integer). Would you prefer that accessing .failed throw an exception if the process is still running?

@oremanj
Copy link
Member Author

oremanj commented Jan 2, 2019

The reported coverage miss is on the blank line after test_not_a_failure_if_we_signal() in test_subprocess.py, which seems questionable to me. I also saw a single stray coverage line on #835 - it was a comment there. Any clues for fixing would be appreciated.

@njsmith
Copy link
Member

njsmith commented Jan 11, 2019

I'll start with a pass through the high-level description. I haven't looked at the code at all :-).

You know that trope of kids growing up and realizing they're turning into their parents? While writing this I'm having some definite flashbacks to Guido demanding I pare down my beautiful proposals to the simplest possible thing, and then a bit further. It don't remember it being very fun to be on the receiving end. So, uh... sorry?

Add NullStream to the generic streams interface, which acts like /dev/null

Do we really need this? Why?

Move from trio.subprocess to the base trio module, so trio.subprocess is just a reexport of constants and exceptions

Let's split this out into #852 so we can discuss the trade-offs properly.

Forbid lists with shell=True and strings with shell=False on UNIX due to the likelihood of user confusion

Good idea. Maybe submit this as a standalone PR?

Fix support for multiple tasks calling wait() simultaneously

Oh yeah we definitely want this! Again, maybe a good quick standalone PR?

Add ProcessStream, a single HalfCloseableStream wrapper around Process
Add open_process, a context manager that provides a ProcessStream and can raise CalledProcessError if the process fails

Maybe Process objects should simply have a .stdio attribute, which is StapledStream(self.stdin, self.stdout) if those are both Streams, and None otherwise? Would that be enough to remove the need for all this?

Add constructor options shutdown_signal and shutdown_timeout to allow orderly shutdown of subprocesses
cancelled attribute to track whether a subprocess was terminated due to aclose/etc cancellation
failed attribute to track whether a subprocess exit failure was "our fault" (if we killed it or closed its output streams before it had exited)
join method to do aclose without closing the pipes
shutdown method to do a cancelled join, since with shutdown_signal/shutdown_timeout this might be more complex than just kill()

This is... a lot of stuff. What is it all for? Why should we believe that people will use it?

Add run_process, like subprocess.run but with better defaults

Good idea :-)

Instead of the previous timeout and deadline arguments, add preserve_result=True option to return the CompletedProcess even if cancelled; this seems less invasive

Huh, that's a really interesting idea! Normally we're all dogmatic, Thou Shalt Not Catch A Cancelled Exception, but this is ... actually maybe a good reason to break that rule? Or would it be too error-prone in cases like:

async def outer_function():
    with move_on_after(...):
        ...

# Eventually called by outer_function, separated by like 10 stack frames or something
async def inner_function():
    with move_on_after(...):
        completed_process = await trio.run_process(..., preserve_result=True)
   # Here if completed_process says it failed, we expect it's because of our move_on_after
   # but maybe it's actually because of the one way out in outer_function?

Add our own CompletedProcess with cancelled and failed attributes and consistent naming of the command (we use command vs the stdlib args)

Hmm. I see the point, but it also seems a bit awkward to deal with. Where do we put it? Do we need to keep the trio.subprocess namespace just so we can put it there? (I would say let's just keep it anonymous and that'd be fine, but we already have issues like #778 so apparently anonymous classes are no longer a thing that users will put up with.)

@oremanj
Copy link
Member Author

oremanj commented Jan 13, 2019

Thanks for the feedback. I understand that there's a lot here and I appreciate the backpressure looking for cleaner ways to do things. Can I ask though that you read the documentation changes included in this PR before making decisions about whether something is worth the user-facing complexity or not? I put most of my explanatory effort there, not in the commit message, because docs reach a good deal more people than the commit messages do. :-)

Add NullStream to the generic streams interface, which acts like /dev/null

Do we really need this? Why?

NullStream can be removed if we remove ProcessStream (I only needed it for ProcessStream.stderr). I think I prefer the .stdio solution there anyway.

Move from trio.subprocess to the base trio module, so trio.subprocess is just a reexport of constants and exceptions

Let's split this out into #852 so we can discuss the trade-offs properly.

Sure. There doesn't seem to be anyone opposed, though, so maybe there's not much to discuss?

Fix support for multiple tasks calling wait() simultaneously

Oh yeah we definitely want this! Again, maybe a good quick standalone PR?

#854

Forbid lists with shell=True and strings with shell=False on UNIX due to the likelihood of user confusion

Good idea. Maybe submit this as a standalone PR?

This sort of goes along with the section I added to the docs about quoting, and when trying to split it off I realized it's going to be troublesome to isolate those doc changes from the "we're not just reexporting subprocess" ones. So I'm going to leave it for the moment; happy to revisit later when we've decided more about the other stuff.

Add ProcessStream, a single HalfCloseableStream wrapper around Process
Add open_process, a context manager that provides a ProcessStream and can raise CalledProcessError if the process fails

Maybe Process objects should simply have a .stdio attribute, which is StapledStream(self.stdin, self.stdout) if those are both Streams, and None otherwise? Would that be enough to remove the need for all this?

We can use that in place of ProcessStream, sure. I think there are still other reasons to have open_process, though, rather than just the base Process:

  • better defaults (raise on failure and attach pipes to stdin/stdout/stderr out of the box)
  • it can only be used as a context manager, which encourages better structured-concurrency type architecture, which I think is in line with Trio's goals

Add constructor options shutdown_signal and shutdown_timeout to allow orderly shutdown of subprocesses
cancelled attribute to track whether a subprocess was terminated due to aclose/etc cancellation
failed attribute to track whether a subprocess exit failure was "our fault" (if we killed it or closed its output streams before it had exited)
join method to do aclose without closing the pipes
shutdown method to do a cancelled join, since with shutdown_signal/shutdown_timeout this might be more complex than just kill()

This is... a lot of stuff. What is it all for? Why should we believe that people will use it?

Shutdown stuff: If you start a subprocess that does nontrivial work, depending on the application it might be considered pretty rude to SIGKILL it without giving it any prior notice that you'd like it to clean up and exit. You might wind up with temporary files still on disk, sub-subprocesses getting reparented to init and running forever, etc. Probably whatever mechanism we use for grace periods (#147) could (should?) replace the explicit specification of a shutdown_timeout. I think it's still pretty important to be able to specify what signal gets used, though, since there's not one good choice for everyone: the venerable UNIX default is SIGTERM, but Python programs will out-of-the-box respond better to SIGINT, etc. (I don't feel nearly as strongly about supporting shutdown_signal=0 meaning "close pipes and wait"; I added it mostly because it's pretty much the only way we can handle this at all for Windows. I don't actually care that much about Windows but I gather that some people do :P)

Process.cancelled: This is mostly (entirely?) to give run_process() something to fill in CompletedProcess.cancelled from. It could be made private.

Process.failed: Making errors loud by default is great! But if the user explicitly kills a process, I think they will be confused and saddened when they get a CalledProcessError out of that. And simply falling off the bottom of the open_process() context manager without reading EOF from both stdout and stderr makes it somewhere between possible and likely that the process will exit with a broken pipe error. I think it's better to have our determination of "should we throw an exception about this process having failed?" be fairly nuanced, than to have it be misled by common user actions in a way that might cause them to not check the return code at all.

Process.join: is mostly to avoid code duplication for the benefit of open_process and run_process; could easily be made private if you think it's too much clutter in the public API.

Instead of the previous timeout and deadline arguments, add preserve_result=True option to return the CompletedProcess even if cancelled; this seems less invasive

Huh, that's a really interesting idea! Normally we're all dogmatic, Thou Shalt Not Catch A Cancelled Exception, but this is ... actually maybe a good reason to break that rule? Or would it be too error-prone in cases like: [...]

It's definitely a power user feature, and might merit a warning in the docs, and/or a suggestion to put an explicit checkpoint after the call. I don't mind going back to timeout/deadline if you think that's less confusing, but I do think we should have some way to support this use case. Maybe that works out to "we need a general way to report partial progress before a cancellation, and then run_process() can just use that".

I appreciate the simplicity of Trio's stance that "a cancelled operation didn't happen", but it doesn't necessarily compose very well -- if an operation is built out of multiple other underlying operations that can't readily be rolled back, either the "cancelled = didn't happen" rule has to break or the entire higher-level operation has to be uncancellable once started. I don't think we want to propose the latter, so maybe we should think about a user-friendly way to talk about the circumstances in which the rule gets bent?

@njsmith
Copy link
Member

njsmith commented Jan 17, 2019

Can I ask though that you read the documentation changes included in this PR before making decisions about whether something is worth the user-facing complexity or not? I put most of my explanatory effort there, not in the commit message, because docs reach a good deal more people than the commit messages do. :-)

Ah cool, thanks for the heads up :-). Read them now.

At a high level: I think it's great you put this mega-PR together. Like you say in the first comment, it's great for identifying issues. Now that we have this laid out though, I think a divide-and-conquer strategy is going to work best for actually getting changes in :-). So I'm going to be looking for all the uncontroversial bits we can carve off and merge on their own.

In that vein, I see three "obvious" PRs, beyond the ones you already submitted. I'll discuss the other topics more below too, but these are probably the next things to actually work on :-):

  • Adding Process.stdio
  • Adding the prohibition on shell=True. I see what you mean about the docs, and I do like the idea of the trio docs teaching people general facts about how processes work, not just the minimal details of our API :-). But, this is such an obscure and useless edge case that I think it's OK to put the change in by itself, and then worry about teaching people about quoting in general later. In the unlikely case that anyone runs into this, the error message will explain it well enough.
  • The "minimal core" of run_process, that doesn't (yet) do anything clever with cancellation/timeouts.

I think there are still other reasons to have open_process, though, rather than just the base Process:

  • better defaults (raise on failure and attach pipes to stdin/stdout/stderr out of the box)
  • it can only be used as a context manager, which encourages better structured-concurrency type architecture, which I think is in line with Trio's goals

I feel like I still have a lot of questions here:

  • Is capturing stderr the right thing to do? In the cases I'm thinking of like git hook programs, you usually want to let stderr go to the terminal, I think? And if you capture it but don't realize, and don't start a task to take care of reading from it, then when an error occurs you'll get nothing at best and a deadlock at worst, which is definitely not ideal...
  • A number of the subtopics in this thread deal with the complexities around process shutdown and what should count as an error... for run_process it's simpler, but raising out of a __aexit__ method is always fraught (e.g. should you do something different if you got there via an exception?). Do we understand it well enough to implement a check=True equivalent for a context manager like this, that actually DWIM? And if we do, would it make sense to add it to Process directly?
  • We do like context managers around here, but the Process API already gives users the choice of either using a context manager or not. So even if this API makes the context manager mandatory, from the user perspective it's still optional. Is the value from the other open_process features enough to make this nudge helpful?
  • Auto-cancellation on process failure seems dubious...
  • Is spawning a child and interacting with it like this a common enough use case, and does this reduce the complexity of that use case, enough to justify adding this to the top-level trio namespace?

So to me this feels like something to come back to after we've dealt with all the easier stuff...

Shutdown stuff

Okay yeah this is interesting and complicated. I feel like there are a few interlocking cases here: what Process.aclose() does normally, what Process.aclose does when cancelled, what run_process will do when cancelled... this feels like it deserves its own discussion thread? And we might want to defer some of it until after #147 has progressed more.

Process.cancelled, Process.failed, Process.join

It sounds like these are all mostly helpers for other features, so we can defer worrying about them until we figure out those other features :-)

I appreciate the simplicity of Trio's stance that "a cancelled operation didn't happen", but it doesn't necessarily compose very well -- if an operation is built out of multiple other underlying operations that can't readily be rolled back, either the "cancelled = didn't happen" rule has to break or the entire higher-level operation has to be uncancellable once started. I don't think we want to propose the latter, so maybe we should think about a user-friendly way to talk about the circumstances in which the rule gets bent?

Yeah, that stance is good (necessary) for fundamental operations like Lock.acquire or sock.recv, but it doesn't generalize. This also sounds like a good topic for its own discussion thread?

@smurfix
Copy link
Contributor

smurfix commented Jan 18, 2019

Is capturing stderr the right thing to do?

Sometimes.

If I'm a user-facing program that starts some subprocess: probably not, let it go to /dev/stderr. If I'm a OS-level job controller or whatever: yeah, failure reports need to include stderr, you can't let them go to syslog (or, worse, /dev/null) because that is associated with my job controller, not with the job.

The most common case is probably the former, so that should be the default IMHO.

@njsmith
Copy link
Member

njsmith commented Jun 11, 2019

@oremanj I think this was superseded by #872? I'm going to mark it closed based on that belief, but it might be worth checking if there's anything else you want to salvage out of it.

@njsmith njsmith closed this Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trio.run_process
4 participants