Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blocking call in one fiber can cause IO timeouts in others #8480

Closed
rdp opened this issue Nov 16, 2019 · 2 comments
Closed

blocking call in one fiber can cause IO timeouts in others #8480

rdp opened this issue Nov 16, 2019 · 2 comments

Comments

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

Fom: #8376 (comment)

It seems that if a thread blocks on a call, other IO that is, for instance, waiting for a connect to occur can be marked as having "timed out" even though the socket operation might still be in a healthy state:

ex:

require "socket"
lib C
  fun sleep(value : UInt32) : UInt32
end

spawn {
  loop {
    TCPSocket.new("facebook.com",80,0.1,0.1).close # connect timeout 0.1s
    STDOUT.print "."
  }
}

loop {
  puts "sleeping"
  Fiber.yield
  C.sleep(1) # force the other fiber to exceed its connect timeout. replace with sleep(1) and works
}

In linux I get:

sleeping
sleeping
Unhandled exception in spawn: connect: Network is unreachable (Errno)
  from /usr/share/crystal/src/socket/tcp_socket.cr:75:15 in 'initialize'
  from /usr/share/crystal/src/socket/tcp_socket.cr:27:3 in 'new'
  from bad.cr:9:5 in '->'
  from /usr/share/crystal/src/fiber.cr:255:3 in 'run'
  from /usr/share/crystal/src/fiber.cr:48:34 in '->'
  from ???

and OS X

sleeping
sleeping
sleeping
Unhandled exception in spawn: connect timed out (IO::Timeout)
  from /usr/local/Cellar/crystal/0.31.1/src/socket/tcp_socket.cr:75:15 in 'initialize'
  from /usr/local/Cellar/crystal/0.31.1/src/socket/tcp_socket.cr:27:3 in 'new'
  from test2.cr:9:5 in '->'
  from /usr/local/Cellar/crystal/0.31.1/src/fiber.cr:255:3 in 'run'
  from /usr/local/Cellar/crystal/0.31.1/src/fiber.cr:48:34 in '->'
sleeping

Just wondering if there's some easy fix or not (retry non-blocking before raising?). Though I suppose an answer might be "don't run long running C calls" though sometimes that's tricky...

Related to #1454, but wanted to make it a separate issue so it could be discussed on its own merits.

crystal 0.31.1

@ysbaddaden
Copy link
Contributor

That's because you block the current thread, which prevents other fibers from running. By the time other fibers are resumed, whatever the state they're in: the operation timed out.

@rdp
Copy link
Contributor Author

rdp commented Nov 16, 2019

OK I guess with the current implementation it's impossible to have an "easy fix" (i.e. before raising timeout, non-blocking retry the behavior) because we mark the socket as non blocking, set the connect timeout on it, then don't return to it before the timeout has elapsed to "check" if the connect has succeeded, which apparently isn't allowable in linux. So maybe someday there'll be multi-thread that could "steal" the fiber away and finish it, or what not, or some other fix. Deferring to #1454, thanks for the feedbacks :)

@rdp rdp closed this as completed Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants