Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP.open hanging #339

Closed
ssfrr opened this issue Oct 20, 2018 · 13 comments
Closed

HTTP.open hanging #339

ssfrr opened this issue Oct 20, 2018 · 13 comments

Comments

@ssfrr
Copy link
Contributor

ssfrr commented Oct 20, 2018

Julia 1.0.1
HTTP.jl 0.7.1
MbedTLS.jl 0.6.4

I'm trying to explore HTTP.open, and I tried the following just to make sure I was able to connect:

using HTTP
STREAM_URL = "http://doppler.media.mit.edu/impoundment.opus"
HTTP.open("GET", STREAM_URL) do http
end

But it seems to hang on exiting the do block.

The backtrace when I interrupt it is:

ERROR: InterruptException:
Stacktrace:
 [1] process_events at ./libuv.jl:98 [inlined]
 [2] wait() at ./event.jl:246
 [3] wait(::Condition) at ./event.jl:46
 [4] wait_readnb(::Sockets.TCPSocket, ::Int64) at ./stream.jl:301
 [5] eof at ./stream.jl:61 [inlined]
 [6] eof(::HTTP.ConnectionPool.Transaction{Sockets.TCPSocket}) at /home/sfr/Dropbox/projects/active/2018 Fall Member Event/dev/HTTP/src/ConnectionPool.jl:155
 [7] eof at /home/sfr/Dropbox/projects/active/2018 Fall Member Event/dev/HTTP/src/Streams.jl:187 [inlined]
 [8] closeread(::HTTP.Streams.Stream{HTTP.Messages.Response,HTTP.ConnectionPool.Transaction{Sockets.TCPSocket}}) at /home/sfr/Dropbox/projects/active/2018 Fall Member Event/dev/HTTP/src/Streams.jl:293
 [9] macro expansion at ./task.jl:262 [inlined]
...snipped

I tried adding a readavailable(http) and got the error ArgumentError: function requires headerscomplete(http.message). Looking at the docs more closely it looks like maybe I need to call startread(http), but that also hangs.

I'm not super clear on what level of abstraction to expect from HTTP.open. What I'm looking for is a way to open the given URL (which gives chunked binary data), read some data from it for a while, and then stop reading from it. Maybe the fact that I'm never hitting the end of the stream (because it doesn't end) is confusing HTTP.jl?

@samoconnor
Copy link
Contributor

I tried adding a readavailable(http) and got the error ...

Did you try a while !eof(io) readavailable(io) loop as suggested here?: #337 (comment)

Looking at the docs more closely it looks like maybe I need to call startread(http), but that also hangs

(eof() calls startread if needed. So in the common while eof() ... loop pattern there is no need to call startread. We could modify readavailable to call startread too.

The hang is not related to startread it is because the do function has not consumed the whole Response Body.

Maybe the fact that I'm never hitting the end of the stream (because it doesn't end) is confusing HTTP.jl?

Yes.
When your do function exits without having consumed the Response Body, closeread attempts to read (and discard) it to ensure that the connection is ready for the next Request.

HTTP.jl/src/Streams.jl

Lines 291 to 296 in a1e92ca

function IOExtras.closeread(http::Stream{Response})
# Discard body bytes that were not read...
while !eof(http)
readavailable(http)
end

You could do close(http.stream) in your do function to close the underlying HTTP.ConectionPool.Transaction. This would cause the HTTP.open to throw EOFError().

It would be possible to add Base.close(http::HTTP.Stream) to do a clean shutdown in your use case of an infinite stream.

I'm not super clear on what level of abstraction to expect from HTTP.open. What I'm looking for is a way to open the given URL (which gives chunked binary data), read some data from it for a while, and then stop reading from it.

That is pretty much exactly what it is intended to provide. But currently the "stop reading" part is not handled nicely.

In #338 I'm trying to make this work more smoothly with readbytes!, unsafe_read etc.
See this commit note for an example of two different read loops: 6414e99

With #388 calling readbytes! in a loop should be efficient and minimise layers of buffering. If the buffer passed to readbytes! is a bit bigger than the underlying chunk size, then you should end up with smooth streaming of one chunk per read call.

I will try adding a close(::HTTP.Stream) method in #388 also.

@ssfrr
Copy link
Contributor Author

ssfrr commented Oct 25, 2018

Did you try a while !eof(io) readavailable(io) loop as suggested here?

I filed this more or less concurrently with that issue, so hadn't tried it yet. I didn't think that eof would be relevant because I knew a priori that I wasn't at the end of the stream (I know for real production use I'd also want to handle the case where the stream is dropped, but here I was mostly just exploring).

So as I understand it there are 2 issues at play:

  1. when using HTTP.open you have to call startread to tell HTTP that you've finished writing the request, which I wasn't doing. From the examples my mental model was that HTTP.open gave you a stream that was ready for reading, and it wasn't clear to me that eof had the side-effect of transitioning from writing to reading. I think adding startread to the read* functions would have resolved the issue here. In general my preference is to have this sort of thing be more explicit and just have the user call startread directly (and make the state change explicit in the examples), but I'm sure not all share my opinion there. finishrequest could be another name for that.
  2. HTTP tries to read to the end of the stream after the do block completes, which is problematic if the stream doesn't end

I will try adding a close(::HTTP.Stream) method in #388 also.

It seems weird that you'd need to call close on the stream within the do block - usually the reason I use a do block is that all the resource cleanup happens automatically. Would it cause problems if closeread (or the code that calls closeread) closed the stream before reading from it? I don't have a clear enough understanding of the stream/transaction layering and use-cases to know what's expected to be re-used, or where the right trade-off between explicitness, ease-of-use, and performance is.

Thanks for the work on #338! I think that's orthogonal to this issue, so I'll try it out and let you know in that thread.

@samoconnor
Copy link
Contributor

when using HTTP.open you have to call startread to tell HTTP that you've finished writing the request,

Kind of, but no. You have to call startread to tell HTTP that you're ready to read the Response Headers and that you'd like to start reading the Response Body. But, HTTP is bi-directional, so being ready to read the Response is not necessarily related to being done with writing the request. The transfer of request and response data is asynchronous and both may well go on forever (see chat client/server example below).
At the Stream layer, startread means "read the Response Headers"; and closeread means "we're done with the Response Body".
At the Transaction layer startread means wait for other Transactions on a shared Connection to be done reading and obtain a read lock; and closeread means release the read lock so that the next Transaction can read it's Response. Note: at this point closeread must read and discard any left-over Response Body bytes to ensure that the stream is up to the Response Headers of the next Transaction. This is where you were seeing a "hang". (your point 2. above)

From the examples my mental model was that HTTP.open gave you a stream that was ready for reading, and it wasn't clear to me that eof had the side-effect of transitioning from writing to reading.

Your mental model is what I intended. Not doing startread in the read* functions is a bug. I though eof was enough, because in all the code I've written there is an eof loop (or you call some base function that has an internal eof loop). I was just wrong on this. My intention is that the user should just be able to read or write without thinking about the Transaction layer as much as possible.

It seems weird that you'd need to call close on the stream within the do block - usually the reason I use a do block is that all the resource cleanup happens automatically.

Yes. I guess the problem here is that there are two different types of resource cleanup. There is "done with the transaction" and then there is "done with the connection". The do block cleans up closeread and closes the transaction. It can only do one or the other.

Options I can see for an API to handle your use case are:

  • Use the reuse_limit=0 option. I could then skip the read-until-end-of-body in closeread knowing that this is the last transaction on the connection.

    HTTP.jl/src/HTTP.jl

    Lines 92 to 97 in df13276

    Connection Pool options
    - `connection_limit = 8`, number of concurrent connections to each host:port.
    - `pipeline_limit = 16`, number of concurrent requests per connection.
    - `reuse_limit = nolimit`, number of times a connection is reused after the
    first request.
  • Base.close(::HTTPStream) (implemented in call to missing setheader method on Julia 0.7 #388)

I think it makes sense to support both.

Chat example:

HTTP.jl/src/Servers.jl

Lines 170 to 177 in df13276

chat_server() = HTTP.listen("127.0.0.1", 8087) do io
write(io, "HTTP.jl Chat Server. Welcome!")
chat(io)
end
chat_client() = HTTP.open("POST", "http://127.0.0.1:8087") do io
chat(io)
end

HTTP.jl/src/Servers.jl

Lines 161 to 169 in df13276

function chat(io::HTTP.Stream)
@async while !eof(io)
write(stdout, readavailable(io), "\\n")
end
while isopen(io)
write(io, readline(stdin))
end
end

@ssfrr
Copy link
Contributor Author

ssfrr commented Oct 26, 2018

Ah, OK. thanks for taking the time to explain all this. I'm definitely used to thinking of HTTP in more of a request/response scenario, I didn't know exchanging interleaved data like the chat example was valid HTTP.

What if closeread had a timeout or limit on how many times readavailable is called? Then if it times out it just closes the connection. It seems like in that situation the performance gains from connection pooling would be outweighed by the time you're waiting to read the rest of the data.

@samoconnor
Copy link
Contributor

I just noticed looking at closeread again that we check for hasheader(http.message, "Connection", "close") after the # Discard body bytes that were not read... step.
If we checked for "close" before trying to read the left over body data then at least for well behaved servers, we wouldn't have a problem.

A server that knows that it does not want the client to reuse the connection is supposed to send a "Connection: close" header. It seems that in the Icecast server in your case is not setting this header, but it probably should. If the response body is intended to be infinite, then it makes no sense to tell the client to reuse the connection for the next request. (In HTTP/1.0 the default was one-request-per-connection and a keep-alive header was used to signal an intent to reuse the connection. In HTTP/1.1 it is assumed that reuse is always wanted unless "Connection: close" is sent.
It might be worth filing a bug against "Icecast" to ask them add the "close" header to infinite streams.

What if closeread had a timeout or limit on how many times readavailable is called? Then if it times out it just closes the connection. It seems like in that situation the performance gains from connection pooling would be outweighed by the time you're waiting to read the rest of the data.

I'd like to try to find a solution within the intent of the protocol before resorting to timeout heuristics.

I just tried this with curl...

$ curl -v -H "Connection: close" http://doppler.media.mit.edu/impoundment.opus
*   Trying 18.85.59.166...
* TCP_NODELAY set
* Connected to doppler.media.mit.edu (18.85.59.166) port 80 (#0)
> GET /impoundment.opus HTTP/1.1
> Host: doppler.media.mit.edu
> User-Agent: curl/7.54.0
> Accept: */*
> Connection: close
>
< HTTP/1.1 200 OK
< Date: Sat, 27 Oct 2018 00:18:00 GMT
< Server: Icecast 2.4.0
< Content-Type: application/ogg
< Cache-Control: no-cache
< Expires: Sat, 26 Jul 1997 05:00:00 GMT
< Pragma: no-cache
< icy-name: no name
< icy-pub: 0
< Connection: close
< Transfer-Encoding: chunked
<
OggS���Bs��3OpusHead�����
...

i.e. setting "Connection: close" on the request causes the server to respond with "Connection: close".
So, if I fixed the "close" check order, and you send a "Connection: close" header, it should work without any need to call close(::HTTP.Stream). (This is about the same as using the reuse_limit=0 option).

@ssfrr
Copy link
Contributor Author

ssfrr commented Oct 29, 2018

I'd like to try to find a solution within the intent of the protocol before resorting to timeout heuristics.

Yeah, I agree this is definitely the right way to go about it if possible.

I guess one issue here is that I'm not sure if Icecast makes the distinction between a stream that's "infinite", vs. one that's just really long or of indeterminate length (likely longer than the receiver wants so they're probably disconnect before it's over), and whether they couple that to the Connection: close header. Do you know if it's common for a server to indicate that a stream is (effectively) infinite with Connection: close?

I just sent an email to the icecast-dev list to get their perspective.

Another idea - what if HTTP.jl just didn't re-use connections if the client disconnected before reading the full response? I assume most of the time people read all the way to the end of a response, so the efficiencies of connection pooling would be retained. If they stopped early, you don't know how long you're going to spend reading out the rest of the data, just to discard it. For example, they could have started downloading some large (but finite) file and decided to cancel the transfer. It might be surprising if HTTP's default behavior is to download the rest of the file before it lets you continue. Is reading out the remainder of the response a common thing for client-side HTTP libraries (or HTTP proxies) to do?

OK, I promise I'm done making armchair design suggestions. 🙂 Thanks for the help in figuring out the right way to use HTTP.jl for this use-case.

@samoconnor
Copy link
Contributor

Do you know if it's common for a server to indicate that a stream is (effectively) infinite with Connection: close?

No I don't.

I just sent an email to the icecast-dev list to get their perspective.

👍

Another idea - what if HTTP.jl just didn't re-use connections if the client disconnected before reading the full response?

That is definitely worth considering.

It might be surprising if HTTP's default behavior is to download the rest of the file before it lets you continue.

I agree.

Is reading out the remainder of the response a common thing for client-side HTTP libraries (or HTTP proxies) to do?

I haven't looked into that.

There are a couple of (I think) common use-case where you might want to reuse the connection but not read complete Response Bodies:

  • Non-200 status responses (3xx redirects, 404 not found etc, auth failure) often have a human-readable HTML Response Body that the client has no interest in reading because all of the relevant semantic information needed to decided what to do next is present in the headers.
  • REST API Response Bodies are often not needed. e.g. if you've submitted an item to a queue, and got a Response with status 200, you might not care about reading an XML Response Boxy that contains meta-data about the billable resources consumed, or the estimated response time, or the server version.

However, in both of these cases, the size of the Response Body to be ignored should be quite small.
So, perhaps we could have a rule that says "if we know there is a small Response Body of known size, flush it through, otherwise close the connection".

@ssfrr
Copy link
Contributor Author

ssfrr commented Oct 30, 2018

I checked a little more into this, and it turns out that icecast is actually using HTTP/1.0 (non-persistent connection by default), but it's sitting behind a reverse proxy that is "upgrading" to HTTP/1.1 but not adding the Connection: close header.

So I do think that a big part of the issue here is a misbehaving server.

Up to you whether you still want to make some of the proposed changes to be more tolerant to this sort of situation (I suppose and also handle the big cancelled download case), or call it a day and close the issue.

Thanks again for looking into it and helping me learn more about HTTP

@samoconnor
Copy link
Contributor

Let's leave this open for now.

@Nosferican
Copy link

Close issue?

@yohan-pg
Copy link

yohan-pg commented Oct 6, 2020

Hey guys, I'm trying to read from an infinite stream line-by-line, and this is super confusing. So to recap:

  • reuse_limit should be 0 in the open call
  • startread(http) and endread(http) must be called within the do body (where http is the Stream object)
  • close(http.stream) should called at the end of the do body (possibly? it seems to work fine without it)

Otherwise the connection never terminates and pollutes the connection pool, blocking all future requests forever. Since the use-case of reading from an infinite stream is not that weird, I feel like this should at least be documented in the HTTP.open method. Especially, having to make sure that startread and endread are called is very unintuitive.

@quinnj
Copy link
Member

quinnj commented Oct 9, 2020

@yohan-pg, would you mind making a PR to the HTTP.open docs with your suggested improvements? I'd appreciate it and I really like your suggestions.

@quinnj
Copy link
Member

quinnj commented May 26, 2022

Closing due to staleness

@quinnj quinnj closed this as completed May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants