Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO waits + control-C on Windows #42

Closed
njsmith opened this issue Feb 7, 2017 · 16 comments
Closed

IO waits + control-C on Windows #42

njsmith opened this issue Feb 7, 2017 · 16 comments

Comments

@njsmith
Copy link
Member

njsmith commented Feb 7, 2017

Windows has no concept of EINTR; if you hit control-C while we're blocked in select or GetQueuedCompletionStatusEx, then the C-level signal handler runs, but the blocking call still runs to completion before returning to Python and letting the Python-level signal handler run. So currently if you do trio.run(trio.sleep, 100000) then on Windows you can't interrupt this with control-C. That's unfortunate! We should fix it.

CPython itself avoids this, e.g. time.sleep(10000) is interruptible by hitting control-C. The way they do it is: in the real C-level signal handler, in addition to setting the magic flag saying that a signal arrived, they also (if this is windows and the signal is SIGINT) do

SetEvent(sigint_event);

where sigint_event is a global singleton Event object which can be retrieved using:

PyAPI_FUNC(void*) _PyOS_SigintEvent(void);

So the basic idea is: before entering your blocking call, you (a) ResetEvent(sigint_event), and (b) arrange (somehow) for your blocking call to exit early if this handle becomes signaled. And then Python's regular signal-checking logic resumes and runs the Python-level handler and we're good to go. We already need the machinery for waking up the main thread when an Event becomes signaled, so that part shouldn't be a big deal, though it's not implemented yet. [Edit: and of course we also have to be careful to only turn this logic on if we are running in the main thread. And it might be important to explicitly check for signals after waking up? I'm not sure how often the interpreter checks, and if we wake up and then go back to sleep without checking then that's bad.]

Open question: I have no idea what the equivalent of this is on PyPy. It's possible they simply don't have this machinery at all (i.e. I don't even know if pypy-on-windows allows you to break out of time.sleep(100000)). As of 2017-02-07 there is no version of pypy that can run trio on Windows (currently the 3.5-compatibility branch is linux-only), so the issue may not arise for a bit.

@njsmith
Copy link
Member Author

njsmith commented Feb 7, 2017

For PyPy, see: https://bitbucket.org/cffi/cffi/issues/306/provide-a-wrapper-for-_pyos_sigintevent-or

(tl;dr: pypy 3.5 doesn't yet have any equivalent of _PyOS_SigintEvent. Hopefully it will grow one and CFFI will expose a nice portable interface to it.)

@njsmith
Copy link
Member Author

njsmith commented Feb 7, 2017

I guess the other option would be to implement this functionality ourselves from scratch: it turns out that Windows allows a process to register multiple handlers, with SetConsoleCtrlHandler and calls them all in sequence; you can see this in the implementation of python 2's time.sleep. The advantages would be (a) we wouldn't be dependent on pypy getting things together, and (b) we could use PostQueuedCompletionStatus which might be less of a hassle than going through event → thread-sitting-in-WaitFor*Objectcall_soon. But, the reduced hassle here is likely outweighed by the additional hassle of needing to communicate the current IOCP handle to the C-level signal handler etc.?

@njsmith
Copy link
Member Author

njsmith commented Feb 10, 2017

Further investigation/thoughts:

SetConsoleCtrlHandler callbacks are run in a new thread (ref). It is actually safe to run cffi-defined callbacks in a new thread (and thus, safe to use cffi to create a SetConsoleCtrlHandler callback that runs Python code), except potentially if subinterpreters are in use. Subinterpreter support is not really a high priority, but OTOH I guess if this is the only blocker to subinterpreter support then maybe it's worth working around?

(The problem with subinterpreters is that if a CFFI callback finds itself running in a new thread that has never run any Python code before, then it uses the PyGILState_Ensure family of functions to allocate a threadstate and so forth, and these functions always register the thread with the main interpreter, not a subinterpreter. So even if trio in general is running in a subinterpreter, the callback would run in the main interpreter. I'm... actually not 100% sure that this would break anything, so long as the callback isn't, like, importing modules and stuff. But in theory it would be bad.)

The other option would be to write a separate tiny package that just provides the feature of registering a SetConsoleCtrlHandler callback that writes to a given socket / calls PostQueuedCompletionStatus, with the handler written in C, and then depend on that on Windows. This could be done while keeping the bulk of our windows support code inside trio proper, so it probably wouldn't be that much of a nightmare to keep working (though it might help if we can settle on a design for the Windows event loop first :-)).

@njsmith
Copy link
Member Author

njsmith commented Feb 12, 2017

Testing this will be tricky, because we'll need a real CTRL_C_EVENT, and that's difficult to do from a test suite on Windows.

It might be okay in a restricted way, like: a test that just spawns a subprocess whose only job is to do a single sleep + handle a single event. I'd be inclined to try running it in a new console, + not even trying to get stdout/stderr back, + maybe a thread that just pokes at the console, since doing console IO seems to be correlated with whether GenerateConsoleCtrlEvent actually delivers an event. And set the PID argument to 0.

The commit that removed the elaborate code attempting to do real CTRL_C_EVENTs is here: 9584365

@njsmith njsmith added Windows and removed polish labels Feb 12, 2017
@njsmith
Copy link
Member Author

njsmith commented Feb 12, 2017

There's also a complication with race conditions: suppose a signal wakes us up. How do we make sure we stay awake until the Python handler runs? If we call SetConsoleCtrlHandler ourselves, then our handler runs before the Python C-level handler, so in principle there's nothing stopping us from triggering a wakeup, and then waking up, and then going back to sleep, and then the Python C-level handler runs. And then even if the C-level handler runs, when does the Python-level handler run? etc. This may not be an issue in practice because the race window might turn out to be very small, but in theory it could be.

A nice thing about _PyOS_SigintEvent is that it's signaled at the end of the Python C-level handler. And I think (?) that once the C-level handler runs then we're guaranteed to run the Python-level handler within a few bytecode instructions, but am not 100% sure.

...here's a wacky option. Just hijack CTRL_C_EVENT handling completely. Register our own handler that returns TRUE meaning "stop processing this event, don't call the other handlers". This should (?) pre-empt the CRT-level signal handler that Python uses. (I guess it might make PyPy grumpy if they're still using CTRL_C_EVENT handlers directly for stuff like time.sleep.) Then call Python C level handler (which we can get a handle to through CRT functions), then do our wakeup.

NB CTRL_C_EVENT is the only meaningful signal on Windows, so this is pretty much sufficient. (There are a few others, but in practice the only way to trigger any of them in python is by using some FFI to call raise.)

@njsmith
Copy link
Member Author

njsmith commented Feb 12, 2017

Oh, there's another option – I guess I read through this too quickly before, but it turns out that set_wakeup_fd is written to in addition to the regular signal handler being run, and it's triggered directly from the C-level handler, and on Windows the fd is allowed (required) to be a socket (on 3.5+). So if we have the ability to wake up when a socket becomes readable, we could use that as our wakeup, while still using the regular signal handling machinery otherwise.

@njsmith
Copy link
Member Author

njsmith commented Feb 12, 2017

...so that means that in our current event loop design, with select as the primary waiting function and an wakeup socketpair... all we have to do is call set_wakeup_fd with the wakeup socket and we're done! (Well, and call signal.set_wakeup_fd(-1) when shutting down.)

If we switch to IOCP as the primary waiting function then things are more complicated. Basically we have to keep issuing a read on the wakeup socket. One way to do that would be a system task that just loops doing that, but then there's the risk that during shutdown the system task will go away and then we get wedged somehow... we really should have a read outstanding every time we block. I guess it wouldn't be terribly hard to just allocate a single OVERLAPPED for this inside the IOManager, and then every time we get it back issue a new WSARecv.

@njsmith
Copy link
Member Author

njsmith commented Feb 16, 2017

Mostly fixed in: cc42cbb

Notes:

  • I tried using signal.set_wakeup_fd. It didn't work. I have no idea why -- I'd do trio.run(trio.sleep, 10), and it'd deliver the signal after it wokeup after 10 seconds, but not before. Very weird. As far as I can tell from the CPython source code, the C-level signal handler should be writing a byte to the wakeup pipe. And interrupting time.sleep works, which relies on a SetEvent call inside the same C-level signal handler, so it seems like that's getting called... IDGI. I tried checking what asyncio does, but it turns out that asyncio has no signal handling code at all on Windows, so never mind.

  • Instead I'm manually registering a cffi callback with SetConsoleCtrlHandler. This has several limitations:

    • Because the cffi callback is run in a new thread, it always gets run in the main interpreter. If trio is running in a subinterpreter then this is bad. At least in theory -- in practice it might work for all I know. This is pretty obscure -- I don't think a lot of people are using multiple subinterpreters, on Windows, and relying on control-C? But it's not ideal either.

    • There's a race condition because we write to the pipe before Python gets notified of the signal (and this notification is happening in another thread, remember), so even if we wakeup and start running Python code there's no guarantee that the signal will get delivered before we go back to sleep. Worst case this just means the user has to hit control-C again though.

    • There's no test, because I don't know how to get GenerateConsoleCtrlEvent to work reliably (see test_ki.py for the full saga).

Leaving this open, because it's probably worth revisiting at some point given these caveats + the likelihood of #52 shaking things up. But for now it is mostly working.

@njsmith njsmith changed the title Make IO waits interruptible on Windows IO waits + control-C on Windows Feb 16, 2017
@njsmith njsmith added the polish label Feb 16, 2017
@dabeaz
Copy link

dabeaz commented Mar 14, 2017

This might be a dumb idea, but if the problem on Windows is the event loop being reawakened, why not launch a background task that does nothing more than run forever and sleep on 100ms intervals or something? At the very least, it'd take care of the problem of someone calling sleep(100000) and never being awakened until afterwards.

@njsmith
Copy link
Member Author

njsmith commented Mar 14, 2017

We actually already have an upper bound on how long we're willing to sleep, but it's currently set to 24 hours. There's no technical obstacle to dropping it down to 100 ms or whatever, except, you know. Polling. Ick :-). It's not obviously better than what we have now, but yeah it's an option to consider.

@dabeaz
Copy link

dabeaz commented Mar 14, 2017

I wasn't suggesting changing the upper bound on sleeping. You'd merely have a task that does nothing but wake up every 100ms--in effect forcing an upper limit on the sleep time passed to select() and related functions. On a modern machine, you'd never notice the effect.

@njsmith
Copy link
Member Author

njsmith commented Mar 14, 2017

@dabeaz: _MAX_TIMEOUT is exactly an upper bound on the sleep time passed to select :-).

@dabeaz
Copy link

dabeaz commented Mar 14, 2017

I need more coffee. Or less coffee.

njsmith added a commit to njsmith/trio that referenced this issue Mar 22, 2017
As compared to the CFFI-based code that this replaces, it's (a) less
race-y (the Python C-level signal handler writes to the
wakeup_fd *after* setting the flag to run the Python-level signal
handler, whereas our old code ran before), (b) avoids arcane CFFI
stuff that might be broken in the presence of subinterpreters, (c)
slightly less code.

The downside of set_wakeup_fd is that writing to the wakeup socket
fails with EWOULDBLOCK, then in our context the correct thing to do is
to ignore that error (we just want to guarantee a wakeup, and if the
send buffer is full then we are definitely going to wake up), but the
hardcoded behavior in signalmodule.c is to print a spurious warning to
stderr. IMO this is makes it useless to us on Unix-likes, because they
get signals all the time, and we configure out wakeup socket to use a
tiny send buffer to save on kernel memory. But on Windows the
trade-offs are different: Windows insists on using a gargantuan send
buffer (see comments in _wakeup_socketpair.py), and the only signal is
control-C. It's not that big a deal if someone gets a spurious error
message in an extremely rare case when responding to an explicit
control-C. (Though it still would be nicer to not have this error
message...)

See python-triogh-42 for more details.
@njsmith
Copy link
Member Author

njsmith commented Mar 22, 2017

I figured out why set_wakeup_fd wasn't working – I was passing it the wrong end of the socketpair. doh :-). #108 switches to using set_wakeup_fd instead of the CFFI hacks, and has notes on the trade-offs.

In particular, there's still no test for this, and it would be nice if there were a way to disable the warning when the wakeup fd overflows. (The latter is really much more of an issue on Unix-likes, though, so more relevant to #109)

@njsmith
Copy link
Member Author

njsmith commented Mar 22, 2017

OK, added a test for this, so now the only remaining issue is the warning thing, which is covered by #109 and #103 so I think we can close this.

@njsmith njsmith closed this as completed Mar 22, 2017
@njsmith
Copy link
Member Author

njsmith commented Apr 19, 2017

#119 is related to this – specifically set_wakeup_fd appears to be somewhat less reliable than hoped :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants