Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

njsmith · 2018-09-13T06:25:33Z

This is something I just realized while thinking about a random question someone asked in #twisted:

In unix_pipes.py, suppose that the following sequence of events happens:

task A calls receive_some
the pipe isn't ready, so task A blocks in wait_readable
some data arrives on the pipe, so task A gets rescheduled
while task A is sitting on the run queue, task B closes the pipe
task A gets scheduled, and calls os.read on the closed fd, which if you're lucky raises an exception (EBADF), or if you're unlucky then a new fd got opened and assigned this value in between steps (4) and (5), and we end up reading from this random fd, which probably corrupts the state of some random other connection.

send_all has analogous issues.

I guess we need to re-check self._closed every time we use the fd, not just once on entry to the function.

This same issue can't happen with SocketStream, because when a socket is closed then it sets the underlying fd to -1, and so step 5 would call recv(-1, ...), which is always an EBADF, and the SocketStream code knows to convert EBADF into ClosedResourceError.

With trio.socket, the -1 thing means you can't get a wild read from a random fd, but there's no explicit handling of this case, so you will get an OSError(EBADF) instead of a proper ClosedResourceError. I guess that would probably be good to fix, though it's less urgent than the unix_pipes.py thing.

The text was updated successfully, but these errors were encountered:

njsmith · 2018-12-28T08:49:29Z

I think this could also affect _windows_pipes.py, though the way Windows handles work it's probably harder to cause actual corruption.

njsmith · 2019-01-22T22:09:03Z

There's also a serious bug in _unix_pipes.py where aclose calls notify_fd_closed without checking _closed. This bit @Ninpo.

I think the root cause is back here, where I should have argued for clearing self.fd when we close it – that would have made it impossible to introduce bugs of this class:
#621 (comment)

We should use that strategy going forward.

This addresses a number of issues: - Fixes a major issue where aclose() called notify_fd_closed() unconditionally, even if the fd was closed; if the fd had already been recycled this could (and did) affect unrelated file descriptors: python-trio#661 (comment) - Fixes a theoretical issue (not yet observed in the wild) where a poorly timed close could fail to be noticed by other tasks (python-triogh-661) - Adds ConflictDetectors to catch attempts to use the same stream from multiple tasks simultaneously - Switches from inheritance to composition (python-triogh-830) Still todo: - Tests for these race conditions that snuck through - Audit _windows_pipes.py and _socket.py for related issues

njsmith · 2019-01-25T08:40:33Z

It looks like _socket.py is OK, because:

The stdlib socket module already uses the trick where as soon as you close the socket, it sets the internal fd/handle value to -1, guaranteeing that any future operations will raise OSError (either EBADF or ENOTSOCK).
The lower-level trio socket module is allowed/expected to raise these OSErrors when you perform operations on an already-closed socket. It can also raise ClosedResourceError, depending on its whims.
The higher-level SocketStream class normalizes the two cases by converting EBADF/ENOTSOCK into ClosedResourceError

njsmith · 2019-01-25T09:00:01Z

...But I'm not sure about _windows_pipes.py. This uses IOCP, and we don't actually have any machinery to notify the IO manager that we're closing a non-socket HANDLE. Maybe we should?

@oremanj I guess might have done some testing of this while implementing it? What happens if you call CloseHandle on a named pipe handle that has an outstanding overlapped ReadFile or WriteFile in progress? Do we need some extra machinery to interrupt readinto_overlapped/write_overlapped, like we have for wait_readable/wait_writable?

njsmith · 2019-01-25T09:24:15Z

And on a similar note, do you think we should we be raising an exception if two tasks are blocked in readinto_overlapped or write_overlapped at the same time on the same handle?

oremanj · 2019-01-25T23:21:40Z

I don't have my Windows VM handy right now, but I'm pretty sure I tested at least the read side of this and it did the right thing (closing the handle caused the ongoing read to fail). Thus this code in _windows_pipes:

            try:
                size = await _core.readinto_overlapped(self._pipe, buffer)
            except BrokenPipeError:
                if self._closed:
                    raise _core.ClosedResourceError(
                        "another task closed this pipe"
                    ) from None

Looks like on the write side we might raise BrokenResourceError instead of ClosedResourceError on a concurrent close. Should be an easy fix if the OS behavior is like I remember it.

njsmith · 2019-01-26T05:19:53Z

It also occurs to me belatedly that the Windows version doesn't involve all the tricky retry loop stuff and the associated subtle timing issues, so it'd be pretty easy to write a test and see if it passes.

This addresses a number of issues: - Fixes a major issue where aclose() called notify_fd_closed() unconditionally, even if the fd was closed; if the fd had already been recycled this could (and did) affect unrelated file descriptors: python-trio#661 (comment) - Fixes a theoretical issue (not yet observed in the wild) where a poorly timed close could fail to be noticed by other tasks (python-triogh-661) - Adds ConflictDetectors to catch attempts to use the same stream from multiple tasks simultaneously - Switches from inheritance to composition (python-triogh-830) Still todo: - Tests for these race conditions that snuck through - Audit _windows_pipes.py and _socket.py for related issues

njsmith · 2019-01-26T08:48:18Z

Oh heh, we actually have a test for the Windows send_all thing – we handle it in wait_overlapped.

And we also have a test for the receive side version, because the generic stream tests do that. (The send-side test requires a "clogged stream", and we skip the clogged stream tests on windows pipes because of missing wait_send_all_might_not_block.)

So that's all good! But, there is still one issue. In the windows send_all code, we split the send into multiple calls:

trio/trio/_windows_pipes.py

Lines 57 to 67 in 3bdd3fd

    
           while total_sent < length: 
        
               # Only send 64K at a time. IOCP buffers can sometimes 
        
               # get pinned in kernel memory and we don't want to use 
        
               # an arbitrary amount. 
        
               with view[total_sent:total_sent + 65536] as chunk: 
        
                   try: 
        
                       total_sent += await _core.write_overlapped( 
        
                           self._pipe, chunk 
        
                       ) 
        
                   except BrokenPipeError as ex: 
        
                       raise _core.BrokenResourceError from ex

I'm actually not 100% clear on whether this is necessary, because the IOCP docs are vague on this. It looks like libuv just does a check that it's not sending more than UINT32_MAX bytes at a time ("because WriteFile() won't accept buffers larger than that"), but doesn't otherwise break up the write. But anyway, so long as we do it like this, there's a small race condition to think about: what if just after a write_overlapped call completes, but before we loop around to call it again, another task closes the stream? Then the handle will be deallocated. If we're lucky, we then get some invalid handle exception. If we're unlucky, another handle could potentially be assigned the same handle id (this happens much less frequently on windows than on Unix, but AFAIK it can happen), and then we'll write to it instead.

So, I think we should either stop splitting up the writes like this, or else we need to do similar fixes to those done for unix pipes in #874.

njsmith · 2019-01-27T02:36:12Z

I opened a new issue to discuss splitting up WriteFile calls on Windows: #883

njsmith · 2019-01-28T02:44:37Z

OK this was a complicated tangle of issues, but I think between #874 and #884 we've fixed all the potential variants of this issue so far, and put measures in place to make sure we won't see new ones in the future.

njsmith mentioned this issue Dec 28, 2018

Add subprocess support #791

Merged

3 tasks

njsmith mentioned this issue Jan 23, 2019

Rewrite unix pipe fd handling logic #874

Merged

2 tasks

njsmith mentioned this issue Jan 25, 2019

Drop the distinction between wait_readable / wait_socket_readable #878

Closed

njsmith changed the title ~~Subtle bug in unix_pipes.py, and maybe _socket.py too~~ Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py Jan 26, 2019

njsmith mentioned this issue Jan 27, 2019

On Windows, should we break up calls to WriteFile to reduce usage of non-paged memory? #883

Closed

njsmith mentioned this issue Jan 27, 2019

In Windows PipeSendStream.send_all, make just one call to WriteFile #884

Merged

njsmith closed this as completed Jan 28, 2019

njsmith mentioned this issue Jan 31, 2019

Can we make abort functions more composable? #896

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

njsmith commented Sep 13, 2018

njsmith commented Dec 28, 2018

njsmith commented Jan 22, 2019

njsmith commented Jan 25, 2019

njsmith commented Jan 25, 2019

njsmith commented Jan 25, 2019

oremanj commented Jan 25, 2019

njsmith commented Jan 26, 2019

njsmith commented Jan 26, 2019

njsmith commented Jan 27, 2019

njsmith commented Jan 28, 2019

Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

Comments

njsmith commented Sep 13, 2018

njsmith commented Dec 28, 2018

njsmith commented Jan 22, 2019

njsmith commented Jan 25, 2019

njsmith commented Jan 25, 2019

njsmith commented Jan 25, 2019

oremanj commented Jan 25, 2019

njsmith commented Jan 26, 2019

njsmith commented Jan 26, 2019

njsmith commented Jan 27, 2019

njsmith commented Jan 28, 2019