Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix looping forever at 100% CPU if socket is closed #10658

Merged
merged 2 commits into from
Jun 15, 2021

Conversation

didactic-drunk
Copy link
Contributor

No description provided.

@straight-shoota
Copy link
Member

Can you please provide a more elaborate explanation (ideally with an example use case) to help understand the reason for this change? Thanks.

@didactic-drunk
Copy link
Contributor Author

didactic-drunk commented Apr 26, 2021

It's a bugfix. The logic is wrong. To reproduce:

sock = XXXServer.new
server = HTTP::Server.new ...
server.bind sock
spawn do
  server.listen
end

# Appears to work, but CPU spikes to 100% and process never exits
sock.close

@straight-shoota
Copy link
Member

I still don't follow. Sorry.

The reproduction doesn't help because it's incomplete. And I don't understand how it should work. server.listen blocks until the server is closed. Since there is no other concurrent code that could close the server, it will never return. So sock.close can't be reached.

I'm very skeptical about an idling HTTP server saturating the CPU. It should just be in waiting when there's nothing to do.

@didactic-drunk
Copy link
Contributor Author

I still don't follow. Sorry.

The reproduction doesn't help because it's incomplete. And I don't understand how it should work. server.listen blocks until the server is closed. Since there is no other concurrent code that could close the server, it will never return. So sock.close can't be reached.

I'm very skeptical about an idling HTTP server saturating the CPU. It should just be in waiting when there's nothing to do.

The example was from memory simplified from my real code and missing a spawn call which is corrected above.

The existing HTTP::Server ignores closed socket status and spins forever if the socket is closed.
Simplified HTTP::Server example:

loop do
  # Normally blocks but immediately return nil if the socket is closed.
  io = begin
    server.accept?
  rescue ex
    handle_error ex
    nil
  end

  if io
    ...spawn and handle request...
  else
    # The current code contains no else handler so... busy loop if closed because accept? returns nil when closed.
    # Also the handle_error path sets sock to nil conflating socket is closed with every other error.  That's fixed too.
  end
end

It seems poor form to ignore a public api's (accept?) return value when it indicates it can't continue. Especially bad in a loop.

@straight-shoota
Copy link
Member

straight-shoota commented Apr 28, 2021

I see, yeah it makes sense now. I suppose the culprit is that closed? was expected to return true when the bound socket is closed. But it does not.

The fix looks good overall.

We should probably still check closed? as loop condition to immediately stop accepting new requests. The shutdown procedure iterates through all sockets, closing each one. But closing a socket can block, which would keep the server responding to requests.

@didactic-drunk
Copy link
Contributor Author

Perhaps I was micro optimizing but:

  • For idle servers the fiber waits on accept?. closed? is never checked (because the socket will return nil and loop ends)
  • For VERY busy servers with more than one socket closed? may be called, but only when closing which is a rare event compared to the extra conditional branch ofuntil closed? Is the conditional branch worth it?
  • Also single responsibility principle. The accept? check can't be removed without adverse consequences - to me that makes it responsible for continued accepting rather than closed?.

Socket#close can block on open tcp sockets depending on SO_LINGER. Is the same true for an accepting server socket? I thought it returned immediately.

@straight-shoota
Copy link
Member

Okay, you're probably right, this is fine.

@straight-shoota straight-shoota added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label May 6, 2021
Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @didactic-drunk 🙏

@didactic-drunk
Copy link
Contributor Author

Ping: Will this 3 line fix make it in 1.1.0?

@asterite asterite added this to the 1.1.0 milestone Jun 15, 2021
@asterite asterite merged commit fd4e7ac into crystal-lang:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants