Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py #661

Closed
njsmith opened this issue Sep 13, 2018 · 10 comments
Closed

Comments

@njsmith
Copy link
Member

njsmith commented Sep 13, 2018

This is something I just realized while thinking about a random question someone asked in #twisted:

In unix_pipes.py, suppose that the following sequence of events happens:

  1. task A calls receive_some
  2. the pipe isn't ready, so task A blocks in wait_readable
  3. some data arrives on the pipe, so task A gets rescheduled
  4. while task A is sitting on the run queue, task B closes the pipe
  5. task A gets scheduled, and calls os.read on the closed fd, which if you're lucky raises an exception (EBADF), or if you're unlucky then a new fd got opened and assigned this value in between steps (4) and (5), and we end up reading from this random fd, which probably corrupts the state of some random other connection.

send_all has analogous issues.

I guess we need to re-check self._closed every time we use the fd, not just once on entry to the function.

This same issue can't happen with SocketStream, because when a socket is closed then it sets the underlying fd to -1, and so step 5 would call recv(-1, ...), which is always an EBADF, and the SocketStream code knows to convert EBADF into ClosedResourceError.

With trio.socket, the -1 thing means you can't get a wild read from a random fd, but there's no explicit handling of this case, so you will get an OSError(EBADF) instead of a proper ClosedResourceError. I guess that would probably be good to fix, though it's less urgent than the unix_pipes.py thing.

@njsmith
Copy link
Member Author

njsmith commented Dec 28, 2018

I think this could also affect _windows_pipes.py, though the way Windows handles work it's probably harder to cause actual corruption.

@njsmith njsmith mentioned this issue Dec 28, 2018
3 tasks
@njsmith
Copy link
Member Author

njsmith commented Jan 22, 2019

There's also a serious bug in _unix_pipes.py where aclose calls notify_fd_closed without checking _closed. This bit @Ninpo.

I think the root cause is back here, where I should have argued for clearing self.fd when we close it – that would have made it impossible to introduce bugs of this class:
#621 (comment)

We should use that strategy going forward.

njsmith added a commit to njsmith/trio that referenced this issue Jan 23, 2019
This addresses a number of issues:

- Fixes a major issue where aclose() called notify_fd_closed()
  unconditionally, even if the fd was closed; if the fd had already
  been recycled this could (and did) affect unrelated file
  descriptors:
    python-trio#661 (comment)
- Fixes a theoretical issue (not yet observed in the wild) where a
  poorly timed close could fail to be noticed by other tasks (python-triogh-661)
- Adds ConflictDetectors to catch attempts to use the same stream from
  multiple tasks simultaneously
- Switches from inheritance to composition (python-triogh-830)

Still todo:

- Tests for these race conditions that snuck through
- Audit _windows_pipes.py and _socket.py for related issues
@njsmith
Copy link
Member Author

njsmith commented Jan 25, 2019

It looks like _socket.py is OK, because:

  • The stdlib socket module already uses the trick where as soon as you close the socket, it sets the internal fd/handle value to -1, guaranteeing that any future operations will raise OSError (either EBADF or ENOTSOCK).
  • The lower-level trio socket module is allowed/expected to raise these OSErrors when you perform operations on an already-closed socket. It can also raise ClosedResourceError, depending on its whims.
  • The higher-level SocketStream class normalizes the two cases by converting EBADF/ENOTSOCK into ClosedResourceError

@njsmith
Copy link
Member Author

njsmith commented Jan 25, 2019

...But I'm not sure about _windows_pipes.py. This uses IOCP, and we don't actually have any machinery to notify the IO manager that we're closing a non-socket HANDLE. Maybe we should?

@oremanj I guess might have done some testing of this while implementing it? What happens if you call CloseHandle on a named pipe handle that has an outstanding overlapped ReadFile or WriteFile in progress? Do we need some extra machinery to interrupt readinto_overlapped/write_overlapped, like we have for wait_readable/wait_writable?

@njsmith
Copy link
Member Author

njsmith commented Jan 25, 2019

And on a similar note, do you think we should we be raising an exception if two tasks are blocked in readinto_overlapped or write_overlapped at the same time on the same handle?

@oremanj
Copy link
Member

oremanj commented Jan 25, 2019

I don't have my Windows VM handy right now, but I'm pretty sure I tested at least the read side of this and it did the right thing (closing the handle caused the ongoing read to fail). Thus this code in _windows_pipes:

            try:
                size = await _core.readinto_overlapped(self._pipe, buffer)
            except BrokenPipeError:
                if self._closed:
                    raise _core.ClosedResourceError(
                        "another task closed this pipe"
                    ) from None

Looks like on the write side we might raise BrokenResourceError instead of ClosedResourceError on a concurrent close. Should be an easy fix if the OS behavior is like I remember it.

@njsmith
Copy link
Member Author

njsmith commented Jan 26, 2019

It also occurs to me belatedly that the Windows version doesn't involve all the tricky retry loop stuff and the associated subtle timing issues, so it'd be pretty easy to write a test and see if it passes.

njsmith added a commit to njsmith/trio that referenced this issue Jan 26, 2019
This addresses a number of issues:

- Fixes a major issue where aclose() called notify_fd_closed()
  unconditionally, even if the fd was closed; if the fd had already
  been recycled this could (and did) affect unrelated file
  descriptors:
    python-trio#661 (comment)
- Fixes a theoretical issue (not yet observed in the wild) where a
  poorly timed close could fail to be noticed by other tasks (python-triogh-661)
- Adds ConflictDetectors to catch attempts to use the same stream from
  multiple tasks simultaneously
- Switches from inheritance to composition (python-triogh-830)

Still todo:

- Tests for these race conditions that snuck through
- Audit _windows_pipes.py and _socket.py for related issues
@njsmith
Copy link
Member Author

njsmith commented Jan 26, 2019

Oh heh, we actually have a test for the Windows send_all thing – we handle it in wait_overlapped.

And we also have a test for the receive side version, because the generic stream tests do that. (The send-side test requires a "clogged stream", and we skip the clogged stream tests on windows pipes because of missing wait_send_all_might_not_block.)

So that's all good! But, there is still one issue. In the windows send_all code, we split the send into multiple calls:

while total_sent < length:
# Only send 64K at a time. IOCP buffers can sometimes
# get pinned in kernel memory and we don't want to use
# an arbitrary amount.
with view[total_sent:total_sent + 65536] as chunk:
try:
total_sent += await _core.write_overlapped(
self._pipe, chunk
)
except BrokenPipeError as ex:
raise _core.BrokenResourceError from ex

I'm actually not 100% clear on whether this is necessary, because the IOCP docs are vague on this. It looks like libuv just does a check that it's not sending more than UINT32_MAX bytes at a time ("because WriteFile() won't accept buffers larger than that"), but doesn't otherwise break up the write. But anyway, so long as we do it like this, there's a small race condition to think about: what if just after a write_overlapped call completes, but before we loop around to call it again, another task closes the stream? Then the handle will be deallocated. If we're lucky, we then get some invalid handle exception. If we're unlucky, another handle could potentially be assigned the same handle id (this happens much less frequently on windows than on Unix, but AFAIK it can happen), and then we'll write to it instead.

So, I think we should either stop splitting up the writes like this, or else we need to do similar fixes to those done for unix pipes in #874.

@njsmith njsmith changed the title Subtle bug in unix_pipes.py, and maybe _socket.py too Subtle bugs around closure handling in _unix_pipes.py and _windows_pipes.py Jan 26, 2019
@njsmith
Copy link
Member Author

njsmith commented Jan 27, 2019

I opened a new issue to discuss splitting up WriteFile calls on Windows: #883

@njsmith
Copy link
Member Author

njsmith commented Jan 28, 2019

OK this was a complicated tangle of issues, but I think between #874 and #884 we've fixed all the potential variants of this issue so far, and put measures in place to make sure we won't see new ones in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants