HTTPS request gets blocked while using proxy #562

vdayanand · 2020-07-27T07:00:06Z

http_proxy=http://127.0.0.1:3128 https_proxy=http://127.0.0.1:3128 julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
julia> using HTTP
julia> count = 0
0
julia> using HTTP

julia> while true
           global count
           count+=1
           @info "Request $count"
           HTTP.get("https://example.com")
       end
[ Info: Request 1
[ Info: Request 2
[ Info: Request 3
[ Info: Request 4
[ Info: Request 5
[ Info: Request 6
[ Info: Request 7
[ Info: Request 8 ## This gets blocked when default connection limit is 8

This happens only for HTTPS connections.

Julia 1.4.1
HTTP.jl 0.8.16
MbedTLS.jl 1.0.2
Proxy: https://hub.docker.com/r/datadog/squid/

The text was updated successfully, but these errors were encountered:

Teo-ShaoWei · 2020-07-29T04:20:24Z

I'm met with the same issue on the same setup as @vdayanand.

I also managed to isolate the issue to HTTP.request not releasing the connection upon receipt of the response, regardless of the response status and which HTTP.request variant we use. Here are some tests I've done, scoped and illustrated using the above code but with HTTP.request("GET", "https://example.com", connection_limit=4) instead:

In REPL, the 5th request hangs. So the issue is affected by connection_limit.
If I interrupt and re-run within the same REPL session, the 1st request hangs. So the issue is persistent across different HTTP.request.
If I end and restart another REPL session, the 5th request hangs.
If I run the above as a background script that hangs at the 5th request but does not terminate, a second run of the script also hangs at the 5th request. So the issue is likely on the connection count on HTTP.jl side.

cflee · 2020-07-29T06:47:10Z

I noticed that there is a change #529 that landed in v0.8.14 which makes some changes to error-handling in the ConnectionPoolLayer, and it looks a bit weird for HTTPS requests.

Previously:

HTTP.jl/src/ConnectionRequest.jl

Lines 84 to 100 in b9076be

    
               if proxy !== nothing && target_url.scheme == "https" 
        
                   target_url = merge(target_url, port=443) 
        
                   return tunnel_request(Next, io, target_url, req, body; kw...) 
        
               end 
        
               return request(Next, io, req, body; kw...) 
        
           catch e 
        
               @debug 1 "❗️  ConnectionLayer $e. Closing: $io" 
        
               close(io) 
        
               rethrow(isioerror(e) ? IOError(e, "during request($url)") : e) 
        
           finally 
        
               if (io.sequence >= reuse_limit 
        
               || (proxy !== nothing && target_url.scheme == "https")) 
        
                   close(io) 
        
               end 
        
           end

Current:

HTTP.jl/src/ConnectionRequest.jl

Lines 84 to 94 in abc5d3d

    
           if proxy !== nothing && target_url.scheme == "https" 
        
               target_url = merge(target_url, port=443) 
        
               return tunnel_request(Next, io, target_url, req, body; kw...) 
        
           end 
        
           r =  request(Next, io, req, body; kw...) 
        
           if (io.sequence >= reuse_limit 
        
           || (proxy !== nothing && target_url.scheme == "https")) 
        
               close(io) 
        
           end

It seems that previously the io from getconnection() will be closed after tunnel_request() returns due to the finally block, but now the condition on line 92 can't be reached so it's only closed if an exception is caught.

Teo-ShaoWei · 2020-07-29T07:18:39Z

Oh my... it worked! 🎉

I changed line 84-89 to the following:

if proxy !== nothing && target_url.scheme == "https"
    target_url = merge(target_url, port=443)
    r = tunnel_request(Next, io, target_url, req, body; kw...)
else
    r = request(Next, io, req, body; kw...)
end

I would believe this part might benefit from some refactoring to bring out the intention better.

For line 84-93, if what we want is for proxied HTTPS connections to always close, then we might want to consider grouping by existence of HTTPS proxy:

if proxy !== nothing && target_url.scheme == "https"
    target_url = merge(target_url, port=443)
    r = tunnel_request(Next, io, target_url, req, body; kw...)
    close(io)
else
    r = request(Next, io, req, body; kw...)
    if io.sequence >= reuse_limit
        close(io)
    end
end

If actually (proxy !== nothing && target_url.scheme == "https") was added to the second condition due to a bugfix as it initially was part of an error correction code, and we actually don't want to always close proxied HTTPS connections, then we might want to consider:

if proxy !== nothing && target_url.scheme == "https"
    target_url = merge(target_url, port=443)
    r = tunnel_request(Next, io, target_url, req, body; kw...)
else
    r = request(Next, io, req, body; kw...)
end

if io.sequence >= reuse_limit
    close(io)
end

Teo-ShaoWei · 2020-07-29T10:16:33Z

On further look, it might be that the additional code was to figure out some weird behaviour of why io wasn't closed, so it did the best effort of closing here and move on. I would believe that reclaiming of the resources should be non-throwing and properly coded as such elsewhere. So we can consider the following code too:

try
    if proxy !== nothing && target_url.scheme == "https"
        target_url = merge(target_url, port=443)
        return tunnel_request(Next, io, target_url, req, body; kw...)
    else
        return request(Next, io, req, body; kw...)
    end
finally
    if io.sequence >= reuse_limit
        close(io)
    end
end

It has a clearer intention, and spur us to fix close elsewhere if it could throw due to some bugs.

Teo-ShaoWei · 2020-07-30T09:54:10Z

Sorry I was being careless, I should have read the history first to have a better understanding of the rationale myself. So I've read that:

We close IO on error (5aa1b0b)
We do not want to reuse of the connection to the HTTPS proxy (proxy= option #219 (comment))
We separately want to close connections above the reuse_limit (Better connection management #292 (comment), HTTP.open hanging #339, reuse_limit=0 should send "Connection: close" header #343, fix #344 and #343 #346)
The error handling got rearranged (Re-arrange error handling a bit #529)

This is from #529:

If we did throw an error and tried to close the connection,
there's no reason to try to do so again during the finally block.
Also, if we are in the error block, close(io) has quite a high chance
for failing, so wrap it in try to suppress such errors and instead
rethrow the original error.

Let's see... The problem is valid. Double closing in both error and final is a code smell. However, I would disagree with the solution. As I mentioned above, (and also discussed in #529 (comment)), close is resource destruction, and shouldn't throw an exception. In the event, close errored out, that is a bug and we will want to fix the implementation of our close instead of giving it concession and affect the correctness here. I think a better way is to challenge the implementation of closing IO inside catch block (5aa1b0b). We might wanted to take the chance to close IO inside a finally block.

I would thus recommend us to go with not closing in catch and only in finally:

try
    if proxy !== nothing && target_url.scheme == "https"
        target_url = merge(target_url, port=443)
        return tunnel_request(Next, io, target_url, req, body; kw...)
    else
        return request(Next, io, req, body; kw...)
    end
finally
    if (io.sequence >= reuse_limit
    || (proxy !== nothing && target_url.scheme == "https"))
        close(io)
    end
end

I might have under-considered the problem, so together with @vdayanand, I would like to invite the original contributors for more discussion on how to resolve this bug please. @samoconnor @Keno

Fixes #562. Despite lots of back and forth on this issue, it turns out there's a simpler diagnosis & fix that worrying too much about when sockets will throw when `close`ing (which we should certainly get all cleaned up one day). The _real_ issue here is in `ConnectionPool.sslupgrade` which was effectively swapping our `Connection` that we formally "acquired" from the connection pool with a new one that had a https-ssl-upgraded `SSLContext` to do the request tunneling. The issue with just swapping the `Connection` is that the original connection was never properly "released" to the connection pool, so connections were acquired, but never released, and eventually we hit the `connection_limit`. The proposed solution here (which works locally to avoid getting hung up on the connection pool clogging), is to introduce a new `kill!(c::Connection)` method that "releases" a connection without returning it to the connection pool. So we signal to the pool that a connection has finished, but it's not put back in the queue to be re-used, which is what we want. I should note that this does mean that proxied https requests then don't really respect the `connection_limit`, but also note that it never has due to this "swapping".

quinnj mentioned this issue Sep 16, 2020

Ensure https proxy connections don't clog up the ConnectionPool #580

Merged

quinnj closed this as completed in #580 Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTPS request gets blocked while using proxy #562

HTTPS request gets blocked while using proxy #562

vdayanand commented Jul 27, 2020 •

edited

Loading

Teo-ShaoWei commented Jul 29, 2020

cflee commented Jul 29, 2020

Teo-ShaoWei commented Jul 29, 2020 •

edited

Loading

Teo-ShaoWei commented Jul 29, 2020

Teo-ShaoWei commented Jul 30, 2020

HTTPS request gets blocked while using proxy #562

HTTPS request gets blocked while using proxy #562

Comments

vdayanand commented Jul 27, 2020 • edited Loading

Teo-ShaoWei commented Jul 29, 2020

cflee commented Jul 29, 2020

Teo-ShaoWei commented Jul 29, 2020 • edited Loading

Teo-ShaoWei commented Jul 29, 2020

Teo-ShaoWei commented Jul 30, 2020

vdayanand commented Jul 27, 2020 •

edited

Loading

Teo-ShaoWei commented Jul 29, 2020 •

edited

Loading