Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RequestProcessor connection reuse with unconsumed requests #7055

Merged

Conversation

straight-shoota
Copy link
Member

Fixes #6924

This removes the skip_to_end call from HTTP::Content#close, which would involuntary consume a large (or infinite) HTTP body.

Instead, the logic is moved to RequestProcessor and it closes the connection if the body is too large (in this case >16kb).

I'm not entirely sure if it should skip any body at all automatically. I'm more leaning towards making this the responsibility of the handler. And the rule would just be if the handler doesn't entirely consume the request body, the connection can't be reused. It's probably fine to leave some leeway, though. That's why I included this into the PR. But we can easily remove it and break if the body contains any data at all.
I'll leave that up for discussion.

The handler can also tell the request processor to close the connection by setting Connection: close header.

@straight-shoota straight-shoota force-pushed the jm/fix/http-request-skip branch from 485598f to 2f9d571 Compare November 9, 2018 15:22
@asterite
Copy link
Member

asterite commented Dec 12, 2018

I didn't see this PR's diff yet, but I'm now thinking we should change HTTP::Client to be concurrent by default, kind of similar to what Go does.

I found this article, which mentions this stackoverflow question.

In short, it seems that when you create a client, do a request but you don't read the whole response and close it then keep-alive won't be used. But it still works: internally I guess it creates a new socket, connects and performs a separate request. If you then don't read and close the connection I guess it's closed after some time. So I'm pretty sure there's a connection pool going on.

We must do this change. Right now everything's fine if you use the http client in a single fiber, but if you use it in multiple fibers then there's a change that requests are intermixed and this should work by default, it's not good to have to wait for other responses to be read and closed.

buffer = uninitialized UInt8[4096]

4.times do
return true if io.read(buffer.to_slice) < 4096
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read isn't guaranteed to read the full slice. You should use read_fully?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this part should be merged anyway. It's better to be strict and don't reuse the connection when there is even only one unconsumed byte.

@asterite
Copy link
Member

Now I read the diff. The change seems fine, except that I'd like HTTP::Client to succeed performing a request even if another request is currently being processed (it's body was not yet consumed). I'm not sure we should merge this because it might break existing code... right? If you perform two consecutive requests and the response body is larger than the maximum skippable size.

@straight-shoota
Copy link
Member Author

@asterite It took my a while to understand why you're mentioning HTTP::Client so much. Might have been me being tired. But yes, this affects client side response consuming as well.

I'm now thinking we should change HTTP::Client to be concurrent by default, kind of similar to what Go does.

That's #6011

We might consider using an auto-consuming IO wrapper for client response.
But I think I'd actually prefer to be strict in both server and client. If the request/response is not entirely consumed, the connection can't be reused. That transfers responsibility for deciding what should be consumed entirely to the user - instead of using some guesswork in the general implementation saying that maybe some amount of unconsumed data could be skipped.

@asterite
Copy link
Member

@straight-shoota Right. For me the logic should be:

  • I make a request and get a response
  • If I don't consume that response and make another request, that opens another connection (but still works)
  • If I did consume the response and make another request, reuse the connection using keep-alive

The only problem here is having to eventually close that connection that never consumed the response to avoid leaks.

@straight-shoota
Copy link
Member Author

having to eventually close that connection that never consumed the response to avoid leaks.

This could be implemented in HTTP::Client::Response#close. Users would need to call this method when finished. If the request body was consumed, the connection is returned to the pool. Otherwise it is simply closed. Yielding methods would automatically take care of this.

@straight-shoota
Copy link
Member Author

straight-shoota commented Dec 30, 2018

I added a commit which removes skipping some minor amount of bytes and instead always closes the connection if the request was not entirely consumed. This is more strict, but also more sane and simple to understand. If your handler doesn't consume the request body (if present), Connection: keep-alive won't have any effect.

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this leaks FDs, it's pretty hard to follow the logic here :(

@straight-shoota
Copy link
Member Author

Yes, the socket is always closed at the end of the loop in the ensure block. It's a bit convoluted, maybe there could be some more refactoring.

@straight-shoota
Copy link
Member Author

I'll also add docs about this to HTTP::Server::Handler when we're settled on the behaviour.

@RX14
Copy link
Contributor

RX14 commented Jan 2, 2019

I like the behaviour, personally.

@straight-shoota straight-shoota added this to the 0.28.0 milestone Jan 2, 2019
@straight-shoota
Copy link
Member Author

I've added a section about reusing connections to the API docs of HTTP::Server. I figured this was the right place, because RequestProcessor is entirely undocumented (and should probably be removed #7248) and most of the documentation about behaviour of server handlers was already in HTTP::Server.

See also #7251 for more changes to the existing docs.

@asterite
Copy link
Member

asterite commented Jan 2, 2019

Yes, RequestProcessor is just a helper class, it's just public to be able to test it.

@straight-shoota straight-shoota force-pushed the jm/fix/http-request-skip branch from e37f9d2 to 2692d7c Compare February 10, 2019 16:39
@straight-shoota
Copy link
Member Author

Rebased and ready to ship.

@sdogruyol sdogruyol merged commit eace840 into crystal-lang:master Feb 11, 2019
@straight-shoota straight-shoota deleted the jm/fix/http-request-skip branch February 11, 2019 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

skip_to_end in HTTP::Content#close: Useful or problematic?
4 participants