Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO: Detect closed IO when resuming readable/writable event #8707

Merged
merged 5 commits into from
Jan 28, 2020

Conversation

bcardiff
Copy link
Member

This should fix some race conditions detected in the preview_mt build.

I think there are still some race conditions in IO::Evented since there is no guarantee that, for example, a thread closes the IO while another thread has already initiated a read operation, succeed the check_open in IO::Bufferend#read, but haven't reach the unbuffered_read -> evented_read -> wait_readable.

There is currently no mechanism to ensure that the check for open IO holds long enough until the fibers reschedule.

Even without fixing that scenario this PR should improove the current state for MT:

  • If the IO was closed by another thread while waiting we need to check if it's open or not.
  • If it's closed an IO::Error is raised as a result of the read/write operation as if the IO was has been closed from the beginning.
  • The event_close is now performed before actually closing the underlying file descriptor to prevent libevent accessing it in a closed state.
  • The @closed = true in IO::FileDescriptor needs to happen before the system_close so it is available when the awaiting fibers are woken up.

If the IO was closed by other thread while waiting we need to check if it's open or not.

If it's closed an IO::Error is raised as a result of the read/write operation as if the IO was has been closed from the beginning.

The event_close is now performed before actually closing the underlying file descriptor to prevent libevent accessing it in a closed state.

The @closed = true in IO::FileDescriptor needs to happen before the system_close so it is available when the awaiting fibers are woken up.
@bcardiff bcardiff added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency topic:stdlib:networking labels Jan 21, 2020
@bcardiff bcardiff added this to the 0.33.0 milestone Jan 21, 2020
@jkthorne
Copy link
Contributor

Looks like your mt_preview build timed out. Is that because of this change?

@bcardiff
Copy link
Member Author

No it's something Process.

Running ./bin/crystal spec spec/std/callstack_spec.cr:16 hangs.

And if the changes in private def system_close are reverted it works. But we are back to leaving closed fd in libevent...

@bcardiff bcardiff requested review from RX14 and ysbaddaden January 24, 2020 15:00
@bcardiff
Copy link
Member Author

And the CI is happy 🎉

src/crystal/system/unix/file_descriptor.cr Outdated Show resolved Hide resolved
src/crystal/system/file.cr Outdated Show resolved Hide resolved
@bcardiff bcardiff merged commit 3c9a69b into crystal-lang:master Jan 28, 2020
@@ -250,7 +250,7 @@ class Socket < IO
if closed?
return
elsif Errno.value == Errno::EAGAIN
wait_readable
wait_readable rescue nil
Copy link

@jacobh0 jacobh0 Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcardiff shouldn't this be rescue return nil ? The loop doesn't break otherwise. accept? no longer times-out trying to accept a connection and just hangs forever

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency topic:stdlib:networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants