Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subprocess support #4

Closed
njsmith opened this issue Jan 22, 2017 · 37 comments · Fixed by #791
Closed

Subprocess support #4

njsmith opened this issue Jan 22, 2017 · 37 comments · Fixed by #791

Comments

@njsmith
Copy link
Member

njsmith commented Jan 22, 2017

Edit: if you're just coming to this issue, then this comment has a good overview of what there is to do: #4 (comment)


Original description

Lots of annoying and fiddly details, but important!

I think for waiting, the best plan is to just give up on SIGCHLD (seriously, SIGCHLD is the worst) and park a thread in waitpid for each child process. Threads are lighter weight than processes so one thread-per-process shouldn't be a big deal. At least on Linux - if we're feeling ambitious we can do better on kqueue platforms. On Windows, it depends on what the state of our WaitFor{Multiple,Single}Object-fu is.

@njsmith
Copy link
Member Author

njsmith commented Mar 15, 2017

Looked at this some more today. We want to provide our own version of the subprocess API, in particular run and Popen. (We can drop the legacy APIs like check_call in favor of run.) run needs to be copied and rewritten, but it's only a few lines. Popen is the interesting bit. I'm thinking we should use the real subprocess.Popen to do the actual spawning and setting up of pipes – it's just way too complicated to try and replicate. Then once it's spawned, we can steal the file descriptors and do our own waiting.

Communication: we should expose pipes as Stream objects (which means that this issue depends on the Stream part of #73). We'll need to write our own communicate but this should be extremely easy with trio :-).

Waiting: this we definitely want to implement ourselves. I think we might need to hack at subprocess.Popen to stop it from doing this itself – normally it tries very hard to reap the child, but we're going to have to take that over. (Well... I guess the alternative is: (a) Linux spawns a thread in popen.wait(), (b) other systems block until the process is done and then call popen.wait()?) And then we need our actual implementation, which relies on WaitForSingleObject on Windows, kqueue cleverness on MacOS, and waitpid on Linux.

What else: there are other attributes, in particular terminate-and-friends, but I think everything else can be pretty much passed through directly.

@njsmith
Copy link
Member Author

njsmith commented Apr 21, 2017

Might be worth checking out some alternative subprocess APIs like delegator.py and pexpect (which relies on ptyprocess for a lot of the heavy lifting).

From a quick skim, my tentative conclusion is that the main advantage of delegator.py over subprocess is opinionated defaults, some cute but not obviously compelling API ideas like the block=True option and block() method, and the wrapping of pexpect (though this is also a bit dicey because it explicitly wraps the more-portable but not-really-correct-on-Unix no-pty version of pexpect). The pipeline idea is also handy, but it's basically just a tiny bit of sugar (and doesn't even set up proper unix pipelines!), not something we need to worry about in our lowest-level async subprocess api.

And pexpect looks really cool, but is a bit high level for a basic subprocess spawning API, plus has a really terrifying implementation (lots of code making all kinds of raw POSIX calls directly with workarounds for IRIX and SunOS terminal ioctl weirdnesses, that sort of thing).

@njsmith
Copy link
Member Author

njsmith commented Jun 12, 2017

There's another set of issues that it looks like I didn't write down yet, regarding how to implement pipes (i.e., when the users passes subprocess.PIPE for one of the stdin=, stdout=, stderr=). This is pretty messy.

What subprocess does by default is to create pipe objects on Unix likes (os.pipe), and named pipes on Windows (totally different from Unix pipes). If we want to go along with this, then we can do that by writing custom SendStream and ReceiveStream classes, that on Unix use wait_readable/wait_writable, and the on Windows need some custom wrappers of ReadFile/WriteFile and IOCP. This is a substantial chunk of custom code though. (Note that while we have some of the infrastructure for IOCP in place, none of it is actually used for anything right now! See also #52.)

Alternatively, there's a cute hack we might be able to use: we already have perfectly good code for wrapping sockets. And on Unix, using a socketpair is a perfectly reasonable alternative to using a pipe; we'd just have to notice when the user asked for a subprocess.PIPE, and in that case call socketpair and pass in the fd to the real subprocess.Popen. No big deal.

What about Windows? Well, it turns out that on Windows, a little known fact is that you can use a native socket object as the stdin/stdout/stderr for a subprocess, if and only if you create that socket without passing WSA_FLAG_OVERLAPPED. (This is a magic flag that makes it possible to use that socket with IOCP. Why is there a flag for this, you may ask. Why does it also affect subprocesses, you may wonder. Well, Windows is full of mysteries.) It turns out that if you create a socket the normal way on Python, by calling socket.socket() or socket.socketpair(), then it does pass WSA_FLAG_OVERLAPPED. But it would be pretty easy to expose the C-level WSASocket call with cffi and then use it to create a socket without passing this flag, and then wrap it in a Python socket object. Doing things this way would mean we get to re-use all our socket code instead of having to write brand new wrappers for named pipes. The challenges are:

  • We need to wrap WSASocket: no big deal

  • We need to reimplement socketpair: this is pretty simple. You can look at how the stdlib does it (the Windows socketpair implementation is in python!), or here's a version in C.

  • We need to convince subprocess.Popen to accept the resulting socket objects as valid stdin/stdout. This... might be a huge problem or might be trivial. The problem is that subprocess.Popen expects to be passed an "OSF file descriptor", rather than a "handle". On windows, "handles" are the native objects, and "file descriptors" are these things that the C library emulates. There are functions to convert a handle into a file descriptor and vice-versa. And subprocess.Popen wants to take a file descriptor, unwrap it to get the handle, and then use that. What I'm not sure of though is whether the C library will let us wrap a socket handle in a file descriptor.

@njsmith
Copy link
Member Author

njsmith commented Jun 12, 2017

Well, this is promising:

In [1]: import socket, msvcrt

In [2]: s = socket.socket()

In [3]: s.fileno()
Out[3]: 472

In [4]: msvcrt.open_osfhandle(s.fileno(), 0)
Out[4]: 6

In [5]: msvcrt.get_osfhandle(6)
Out[5]: 472

@njsmith
Copy link
Member Author

njsmith commented Jun 13, 2017

If we want to go along with this, then we can do that by writing custom SendStream and ReceiveStream classes, that on Unix use wait_readable/wait_writable, and the on Windows need some custom wrappers of ReadFile/WriteFile and IOCP

It did occur to me that at least the Unix part of this approach is the same code we need for #174. The Windows part is different though.

@kyrias
Copy link
Contributor

kyrias commented Oct 12, 2017

@buhman Have you had any time to work on this?

@buhman
Copy link
Member

buhman commented Oct 12, 2017

Nope :(

@idlesign
Copy link
Member

Nope :(

@buhman, if so maybe discard self-assignment, so that someone could take it?
As an example: a couple of months ago I had some spare time to work on this, but I saw the assignment and thought: "It's already moving. Great!"

@njsmith
Copy link
Member Author

njsmith commented Oct 13, 2017

Probably it's best to leave the assignment field empty, given that we're all volunteers here and stuff comes up. And if you're wondering if something is available or what to work on, then asking in the gitter chat is probably a good idea.

@smurfix
Copy link
Contributor

smurfix commented Jan 30, 2018

I appear to need the low-level messy stuff (i.e. waiting for child processes sanely) for the asyncio test suite.

The disadvantage of running a new thread with a blocking waitpid() per subprocess is that you can't cancel Python threads, so we will have to resort to waiting for the forked process to die (or force-kill it ourselves, or wait until the user force-kills us). That's suboptimal.

The disadvantage of SIGCHLD is that it's a messy signal that only works in the main process. On the positive side, IMHO these days we can assume that libraries don't steal it from us – and, if required, we could periodically check that the handler is still present.

My take: we could implement both, as depending on the situation the users find themselves in, one may work but the other may not (… and what about Windows?). We do this by implementing a singleton which hides the whole mess from our users. A task that wants to wait for a child process should be able to call await trio.wait_for_child(pid) – and that should simply work and return the appropriate exit code / negative signal number.

The next level up is forking a process and talking to it. IMHO the interface should be async with trio.subprocess_exec(…) as proc:. "proc" should be an instance modelled after asyncio.Process, except that there is no "communicate" method. Leaving the block shall close all file descriptors and then wait for the process. If the process dies with something other than exit code zero, raise an exception. stdin/out/err attributes should be normal Trio Send/ReceiveStreams, unless a file descriptor or a stream is passed in as keyword argument (in which case we pass the descriptor to the process / internally start a read/write task).

Unless somebody has better ideas, I'll start working on the first part of this.

@njsmith
Copy link
Member Author

njsmith commented Jan 30, 2018

Is it possible to skip those parts of the asyncio test suite for now? I don't want to stop you making things better :-), but I also don't want you to get stuck on some distraction just because of the test suite – probably it's a higher priority to get trio-asyncio solid for network apps, and no one can blame trio-asyncio for not handling subprocesses when trio itself doesn't. Plus it's an easy to document limitation in the mean time.

Re waitpid, my thought was to run it in a daemon thread, not using run_sync_in_worker_thread, holding a weakref to a trio.subprocess.Popen object. So the user API would be await popen.wait(), which would just wait for a report to come back from the background thread. Then popen.wait would be cancellable from the user point of view, and the thread just keeps running in the background until either the child exits or the main process does. It's ugly as heck but SIGCHLD is worse.

On Windows and MacOS this is much nicer, but they each need their own new low level primitives exposed. For Windows it's trio.hazmat.WaitForSingleObject, and for MacOS it's some kind of generic kqueue notification API. (In fact we have some kind of generic kqueue notification API in there already, but it's kind of a placeholder. For this we need to think about it properly :-).)

We usually model our APIs after the regular synchronous stdlib, not asyncio. So the default here would be that we're implementing an async version of subprocess, not asyncio.Process. Process APIs are pretty complicated though, so I'm certainly open to arguments that we should tweak some things instead of slavishly following the stdlib semantics.

@njsmith
Copy link
Member Author

njsmith commented Jan 30, 2018

We usually model our APIs after the regular synchronous stdlib, not asyncio. So the default here would be that we're implementing an async version of subprocess, not asyncio.Process.

Oh, ok, now that I've checked the asyncio docs I see that in this case they actually did the same thing and asyncio.Process is basically a subprocess.Popen with a few sensible tweaks.

@smurfix
Copy link
Contributor

smurfix commented Jan 31, 2018

OK. I have implemented a rudimentary waitpid implementation and pushed that to the "sigchld" branch. Implementing further refinements, like a thread-based solution or a kqueue or …, should be straightforward. There are some tests, which happen to pass.

I still need to do some refactoring;

  • I need to allow multiple waiters (but not from multiple threads)
  • … but they might be in a different thread than the main thread
  • the way to set up a process watcher from Trio should be a straightforward "async with …"
    plus a heap more tests.

I don't plan to do the the subprocess clone tomorrow, though. ;-)

@njsmith
Copy link
Member Author

njsmith commented Feb 1, 2018

@smurfix I have to admit that looking at https://github.com/python-trio/trio/compare/sigchld in its current state is mostly just reminding me why I don't want to support SIGCHLD. That's a ton of code and complexities exposed in the public API.

@smurfix
Copy link
Contributor

smurfix commented Feb 1, 2018

Well, there's public and "public". We can probably do without the top-level scan and verify methods, they're mostly for circumventing problems you really should fix by using a different child-watcher class in the first place.

The most scratchy part I see is the choice of sync vs. async watcher and the fact that there are different ways of using them. I suspect that that can't really be helped even when we add some other ways of dealing with child processes. The best way to do it under Linux that I can think of, other than threads which have their own messy trade-offs, is by using a signalfd on SIGCHLD, but that still requires most of the scaffolding I added for dealing with SIGCHLD directly.

I don't see the internals of the watcher classes as public interface per se. You don't actually use any of that; all you do is await wait_for_child() and you're done. All that's really public on part of the handlers are their class names.

In fact, we should hide the whole mess (adding Windows will not make it look any nicer) behind the child_watcher() call. With a bit of experience what works when, it should make a reasonable choice. Shall I move the detail stuff into a couple of trio/_wait/*.py files so that child_watcher() can selectively import the one it needs?

@njsmith
Copy link
Member Author

njsmith commented Feb 1, 2018

I'd really rather not have any visible API beyond Process.wait(), and certainly not asking users to understand the tradeoffs between a bunch of Linux-specific hacks. Windows and MacOS/*BSD don't need or benefit from having child watcher classes at all; the obvious way there is to do everything inside the wait call.

I'm not happy with my idea of calling waitpid in a thread on Linux, but all of these options are terrible. What specifically are you worried about that makes you prefer writing hundreds of lines of signal handling variations?

@njsmith
Copy link
Member Author

njsmith commented Feb 1, 2018

Also, signalfd doesn't even work due to weird design decisions: you have to make sure that every thread in the process is masking SIGCHLD, and that's impossible since we can't control what other threads do.

@smurfix
Copy link
Contributor

smurfix commented Feb 1, 2018

Ok. Fair enough. A couple of the comments in asyncio's Windows handling of this stuff led me to believe that some sort of global state-keeper was needed there too. If that's wrong, so much the better.

Well, we can't do it all inside the wait call. At minimum we need a WeakValueDictionary of pid>waiter records, because we can't cancel the waiting thread – and I'd rather not tell people that they must never time out their Process.wait() calls.

I'll implement a thread-based solution instead, and put that in hazmat. OK?

NB, I do wonder whether one thread which simply waits on all processes might not be a good idea.

@njsmith
Copy link
Member Author

njsmith commented Feb 1, 2018

A couple of the comments in asyncio's Windows handling of this stuff led me to believe that some sort of global state-keeper was needed there too.

Asyncio does its best but it's not a terribly reliable source for this kind of thing. On Windows there's the equivalent of waitpid, but it's straightforwardly cancellable, so you can just call it in a thread and then give up if you want to.

At minimum we need a WeakValueDictionary of pid>waiter records, because we can't cancel the waiting thread

Oh, I see, because you want to wait on an arbitrary pid that some other code spawned. If someone starts a process by calling trio.subprocess.Process(...), then the Process object itself can hold the state that the thread uses to coordinate.

But I guess you are thinking in terms of trio-asyncio using some hybrid of asyncio's process handling code and trio's process handling code? I suspect that's not going to work satisfactorily, and we'll eventually want to implement the process protocol handling stuff on top of trio.subprocess so that it handles things in a uniform trio way. I haven't looked at the details of what asyncio is doing here though. (But you see why I think maybe we shouldn't stress about subprocesses in trio-asyncio just yet :-).)

@smurfix
Copy link
Contributor

smurfix commented Feb 2, 2018

Oh well. I found out yesterday that SIGCHD handling is not only fraught with problems, it cannot work reliably at all. The problem is that while the main thread blocks, waiting for the Trio thread, it does not process SIGCHLD. I also cannot replace the blocking queues with Trio-ish ones because that opens the window for a lot of very interesting race conditions within asyncio.

Now, what do we do about cancellations vs. subprocesses we started? (A not-canceled subprocess is easy: close its stdin and wait for it to die on its own.)

@diorcety
Copy link

@njsmith
This maybe old, but there is a simpliest way to open a socket without WSA_FLAG_OVERLAPPED (default behaviour on python 2.7)
https://github.com/python/cpython/blob/19e7d48ce89422091f9af93038b9fee075d46e9e/Modules/socketmodule.c#L5002
Make support_wsa_no_inherit = 0
How?
Open a fake socket in order to force a failure on WSASocketW

    try:
        dummy = socket.socket(0xDEAD, socket.SOCK_STREAM)
    except:
        pass

I made a python socat version which worked on python 2.7 but failed on python 3. With this trick all is working well (use TCP connection as stdin, stdout of a process)

@diorcety
Copy link

@njsmith
Copy link
Member Author

njsmith commented Mar 21, 2018

@diorcety Ha, that's clever!

Am I understanding right though that using this trick makes python stop setting WSA_FLAG_OVERLAPPED on all new sockets afterwards? We usually try not to alter global state like this, in case some other library is depending on the original behavior... Still, it's an interesting option to know about and keep in mind :-).

@diorcety
Copy link

@njsmith Indeed.
You can wrap like this, which will avoid to change the other socket creation

dummy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
dummy.setsockopt(socket.SOL_SOCKET, SO_OPENTYPE, SO_SYNCHRONOUS_NONALERT)
create_socket_here()
finally:
dummy.setsockopt(socket.SOL_SOCKET, SO_OPENTYPE, SO_SYNCHRONOUS_ALERT)

Regarding the socket.socket(0xDEAD, socket.SOCK_STREAM) trick, you only simulate "Windows 7 or Windows 2008 R2 without SP1 or the hotfix"

@imrn
Copy link

imrn commented Mar 26, 2018

So does trio.subprocess settle with a reliable internal implementation for
linux and/or *bsd?

@njsmith
Copy link
Member Author

njsmith commented Mar 27, 2018

@imrn I'm not sure I understand the question, but as of right now trio doesn't have subprocess support yet (which is why this issue is still open).

@bj0
Copy link

bj0 commented Mar 27, 2018

what about the way curio does it? do they solve the problems that trio is running into?

@smurfix
Copy link
Contributor

smurfix commented Mar 27, 2018

There are no "problems trio is running into". There's an initial subprocess implementation in https://github.com/smurfix/trio/tree/subprocess (I just merged it up to the current Trio master) which works well enough, except that @njsmith would like to see a few changes (which I'd love to have time working on, but …).

@imrn
Copy link

imrn commented Mar 27, 2018 via email

@bj0
Copy link

bj0 commented Mar 27, 2018

@smurfix ah, I hadn't seen the updates after Feb 2, so I had the impression from earlier comments that there was some technical challenges preventing it from being in trio.

good to hear it's working.

@njsmith
Copy link
Member Author

njsmith commented Mar 28, 2018

The only problem trio is running into here is that it's complicated to get this right when you take into account differences between Windows/Linux/kqueue platforms, handling cancellation of wait correctly (curio cuts some corners here), etc., so it won't happen until someone has enough time to work out all these details.


Now, what do we do about cancellations vs. subprocesses we started? (A not-canceled subprocess is easy: close its stdin and wait for it to die on its own.)

Yeah, there are some tricky questions here! One constraint is that we need to make it possible for people to build complicated subprocess control schemes, like sending signals at arbitrary times, starting to wait for them but then cancelling the wait without killing the process, letting the process live on without waiting for it at all, etc.; this is important for making sure trio can solve people's problems. OTOH, we also want to make it easy to not accidentally leave subprocesses running when a process exits, and maybe even treat subprocesses like they were part of our task tree (so propagating cancellations to them, always waiting for them to exit before returning, etc.).

The stdlib behavior is not too unreasonable: you can explicitly call wait or terminate or whatever, you can do a cancellable (or at least timeout-able) wait, and then separately you can also use a Popen object as a context manager, where __exit__ does a special thing: it closes any pipes to the child, and then it waits for the child to exit.

My criticism of this way of handling __exit__ is that if we didn't set up pipes to the child, then it doesn't send any signal at all, it just stops and waits for the child indefinitely without even letting it know that we want to exit. It seems like it would be nice if async with Subprocess(...): ... was more generally applicable?

I suspect we'll also want to make subprocesses implement the AsyncResource interface, so there's a question how aclose should work.

@nosklo
Copy link

nosklo commented May 31, 2018

nursery.start_process_soon()?

@njsmith
Copy link
Member Author

njsmith commented May 31, 2018

@nosklo I think we're more likely to spell it something like nursery.start_soon(run_subprocess, ...). This keeps the API layering cleaner, with nurseries focusing on just concurrency and not knowing about processes, and run_subprocess (or whatever we call it) focusing on processes and not knowing about concurrency. Plus it makes it easier for others to swap in their own version of run_subprocess in case they don't like ours for some reason :-). But yeah, that's basically what I mean by "integrating processes into the task tree", it's just slightly different ways to factor the functionality.

@njsmith
Copy link
Member Author

njsmith commented Jun 21, 2018

There's a lot of chatter in this thread, which is interesting and has good stuff in it, but probably makes it difficult for folks to see the big picture. So here's a quick overview of what needs to be done (and then I'll edit the first comment to link down to this):

Trio is designed in layers, and this will need work across multiple layers. Ultimately, we want something like the subprocess.Popen API. To do that, there are two basic primitives that we need to implement:

This is straightforward in theory, but the I/O part will need special handling on Unix vs. Windows, and the waiting part will need totally different implementations for Linux vs. MacOS/the BSDs vs. Windows. And for extra fun, in order to write these implementations, we'll need to expose some more fundamental primitives inside Trio's core I/O loop.

Here's one possible plan, from the bottom up. This is a pretty complete version; some of these steps can possibly be avoided/deferred with cleverness.

Step 1: WaitForSingleObject (DONE in #575)

Windows has these things called HANDLEs, which are conceptually pretty similar to Unix file descriptors, except they're much more general (e.g. you can have a HANDLE representing a process, or a mutex), and they support a different set of operations. One of the basic operations that lots of different HANDLEs support is to wait for them to become "signaled". We want to implement this operation in Trio – in particular, we'll use it to wait for a process to finish.

Conceptually, the operation we want is essentially WaitForSingleObject, but instead of a timeout argument, we want it to be cancellable using Trio's normal mechanisms. So this is an operation like:

async def WaitForSingleObject(handle):
    ...

And what it should do internally, is create an event object, and then use a thread to call WaitForMultipleObjects on both the handle that was passed in, and also our event object. The idea here is that if the handle becomes signaled, then WaitForMultipleObjects will wake up... or else if our operation gets cancelled, we can wake the thread up earlier using SetEvent.

This will require some additions to trio/_core/_windows_cffi.py to get acces to the underlying primitives like WaitForMultipleObjects and SetEvent. And then some slightly tricky code to push a call off into a thread, and then have the trio task suspend itself while waiting for that thread to finish, and respond to cancellation appropriately. For reference here, you'll want to look at how run_sync_in_worker_thread is implemented in trio/_thread.py.

Actually... since WaitForMultipleObjects has no side-effects, maybe we can use run_sync_in_worker_thread directly, something like:

async def WaitForSingleObject(handle):
    event_handle = ...
    try:
        # cancellable=True means that if this is cancelled, then the call returns immediately, while leaving the thread running in the background
        # We'll want to use a custom limiter so that these threads don't get counted against the
        # global thread limit, b/c this operation can block indefinitely.
        result = await run_sync_in_worker_thread(WaitForMultipleObjects, [handle, event_handle], TRUE, INFINITE, cancellable=True, limiter=...)
    finally:
        # We might have been cancelled, so the thread might be still running in the background.
        # Set the event, to make sure that the thread will return ASAP and free up the memory.
        # To check: is the call to SetEvent necessary, or does CloseEvent alone work?
        # If SetEvent is necessary, is there a race condition if we call CloseEvent too soon?
        SetEvent(event_handle)
        CloseHandle(event_handle)

(Regarding the comment in the finally block: I just asked this question.)

Step 2: ReadFile and WriteFile

Windows provides two primitives, ReadFile and WriteFile, for doing file I/O. In particular, they're what you need if you want to request an async read/write on a named pipe object, and then later get notified when it's finished using IOCP. And named pipe objects are the standard things to use to talk to subprocesses on Windows. So we want to expose ReadFile and WriteFile.

Trio has some stub IOCP support in trio/_core/_io_windows.py, but it isn't really complete or tested, so this would need to be sorted out. Basically we want to implement whatever it is that makes it easiest to implement ReadFile and WriteFile :-). (See the commented-out perform_overlapped method for some idea of how this might look.)

This step could possibly be deferred by using a tricky hack involving sockets, though we'll certainly want ReadFile/WriteFile/IOCP support eventually.

Step 3: arbitrary kevent support (#578)

On MacOS and the BSDs, there's this neat thing called "kqueue", which lets you efficiently wait on all kinds of different events in an async-event-loop-friendly way. Currently Trio just uses this to wait for file descriptors to become readable/writable, but in general kqueue is much more flexible. For example, the EVFILT_PROC event type lets you wait for a subprocess to exit.

trio/_core/_io_kqueue.py should provide the ability to wait on arbitrary event types. (Then we'll use that in our subprocess support to wait for EVFILT_PROC events.)

There's already a stub implementation of this, in the monitor_kevent method. This isn't tested or used currently though, so if we're going to start using it we'd want to make sure it actually works. (And possibly change how it works, if there's some other semantics that make it easier to use.)

Okay, that completes the low-level ground-work. Now we can move up a layer and start implementing actual subprocess functionality.

Step 4: I/O, Unix (DONE in #621)

There are two options here: we can use pipes, or we can use sockets. (If you're on Linux, man 7 pipe and man 7 socket will give you some idea of how these work.)

Pipes are the traditional approach. There's some esoterica you need to look up, but I think the eventual code would be fairly straightforward: you allocate the pipe objects by calling os.pipe, then set the end that we want to read/write from to non-blocking mode using fcntl, something like:

old_flags = fcntl.fcntl(fd, fcntl.F_GETFL, 0)
fcntl.fcntl(fd, fcntl.F_SETFL, old_flags | os.O_NONBLOCK)

And then we can use os.read/os.write to do non-blocking reads/writes to the pipe file descriptor, and Trio's wait_readable/wait_writable primitives can be used to find out when a non-blocking read/write might succeed. So receive_some and send_all would be implemented in pretty much the same way as they are for sockets – in particular, you'd want to look at the recv and send implementations in trio/_socket.py to see the basic code pattern.

The alternative approach would be to use sockets directly, creating them using socketpair. This is kind of weird, but as far as I know it should work fine. The only advantage is that it would let us skip writing the pipe code I just described above, by reusing the socket code instead. OTOH we'll probably end up wanting to write some of this code eventually anywhere, to expose filesystem FIFOs, so eh, really it's just a matter of picking which one you feel like.

Note that if we do use sockets, then we can use SocketStream (in particular for its send_all implementation), but we should not directly use SocketStream as the public API in our Popen implementation – because then people might start assuming they can do things like access the .socket attribute. Instead, we should use some anonymous classes that are a trivial wrapper around SocketStream objects, but only accept the SendStream/ReceiveStream interface, nothing more.

Step 5: I/O, Windows

Once we have ReadFile/WriteFile, we can use anonymous named pipes as our subprocess stdin/stdout/stderr. (Yes, they're called "anonymous named", sorry, Windows is a bit weird sometimes.) And then we'd write some wrapper classes that implement the SendStream and ReceiveStream interfaces, using ReadFile/WriteFile.

Alternatively, if we do the tricky hack involving sockets, then we could re-use trio's socket code here, and then the comments above about SocketStream apply.

Step 6: Waiting for process exit, Linux (DONE IN #622)

This is the super annoying one. There isn't really any better option than calling waitpid in a thread. Here's a first pass to give you the idea:

# global
# dict mapping pid -> WaitpidResult
_pending_waitpids = {}

@attr.s
class WaitpidResult
    event = attr.ib(default=attr.Factory(trio.Event))
    outcome = attr.ib(default=None)

async def _waitpid_system_task(pid):
    # cancellable=True: if this task is cancelled, then we abandon the thread to keep running
    # waitpid in the background. Since this is always run as a system task, this will only happen
    # if the whole call to trio.run is shutting down, and probably the program is about to exit.
    try:
        outcome = await run_sync_in_worker_thread(outcome.capture, os.waitpid, pid, 0, cancellable=True, limiter=UNLIMITED)
    except:
        # This branch handles cancellation... if we're cancelled then the entire call to trio.run is cancelled,
        # so probably no-one is listening for our result! Maybe we don't need this branch at all?
        result = _pending_waitpids.pop(pid)
        result.outcome = outcome.Error(...)
        result.event.set()
        raise
    else:
        result = _pending_waitpids.pop(pid)
        result.outcome = outcome
        result.event.set()

async def waitpid(pid):
    if pid not in _pending_waitpids:
        _pending_waitpids[pid] = WaitpidResult()
        trio.hazmat.spawn_system_task(_waitpid_system_task, pid)
    result = _pending_waitpids[pid]
    await result.event.wait()
    # XX: this isn't quite right because unwrap() will raise an error if multiple tasks try to
    # unwrap it -- need some strategy to handle this that's better than just letting the
    # exception escape. Do we want to support multiple simultaneous calls to waitpid
    # on the same process?
    return result.outcome.unwrap()

As you can see from the comments there are some details to work out, but hopefully that gives the basic idea.

(Some other general semantic questions: do we want to support waitid? Special values for pid (e.g. negative means wait for a process group, etc.)? WUNTRACED etc.? I think for a first version though we should ignore all these details and keep it as an internal API. As long as the only user is trio.subprocess.Popen.wait, none of this stuff matters.)

Step 7: Waiting for process exit, MacOS/BSDs

Implement using our kqueue stuff from above.

Step 8: Waiting for process exit, Windows

Implement using our WaitForSingleObject stuff from above, plus [GetExitCodeProcess](https://msdn.microsoft.com/en-us/library/windows/desktop/ms683189(v=vs.85).aspx) (which will need to be added to our cffi bindings).

Step 9: Wrap all of the above up in a Popen-like interface

Self-explanatory, really. Hopefully this won't be too hard once all the above is in place. There will still be various bits of plumbing to hook up, plus some conveniences like: we'll want to make sure our wait method keeps track of whether the process has already exited, and if so then it should return immediately rather than calling the underlying system's wait primitive.

@thejohnfreeman
Copy link

Would it be acceptable to implement each platform separately? I personally only need Linux support. I'm happy to contribute to Windows and Mac, but I'd like to play with something on Linux as soon as possible. Would you accept a contribution that only works on Linux but has the right cross-platform API?

@thejohnfreeman
Copy link

Regarding Step 1, am I right in thinking we don't really want a new thread waiting for every subprocess? I'm imagining one thread calling WaitForMultipleObjects on H handles where H = P processes + C cancel contexts + 1 control event. The control event wakes the thread to add more subprocess handles and resume waiting. Does that sound feasible?

@njsmith
Copy link
Member Author

njsmith commented Jul 8, 2018

@thejohnfreeman

Would it be acceptable to implement each platform separately? I personally only need Linux support

I want to be a little careful about going too far down the path of supporting one platform at a time, because that's how a lot of projects end up with e.g. Windows as the forever-slightly-broken second-class-citizen. But of course things have to be implemented in some order, and if you want to start with Linux then sure, that's fine.

Regarding Step 1, am I right in thinking we don't really want a new thread waiting for every subprocess? I'm imagining one thread calling WaitForMultipleObjects on H handles where H = P processes + C cancel contexts + 1 control event.

I was trying to keep it simple :-). I think one thread per handle would be a fine first version that's relatively simpler to implement, and then we could always optimize it further in the future. But if someone wanted to go ahead and implement the fancy thing from the beginning, that'd be fine too!

Note that a single WaitForMultipleObjects can only wait for at most 64 handles at a time, so the fancy version would still need to support distributing handles across multiple threads. Probably 63 handles per thread, plus one control handle. (I don't think you need a separate cancel event for each handle. As long as you keep track of which thread is waiting for each handle, cancellation can be handled through the control event.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants