Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncmap never ends with HTTP.get #909

Closed
lucasmsoares96 opened this issue Aug 20, 2022 · 7 comments · Fixed by #910
Closed

asyncmap never ends with HTTP.get #909

lucasmsoares96 opened this issue Aug 20, 2022 · 7 comments · Fixed by #910

Comments

@lucasmsoares96
Copy link

Hello guys. I believe I found a bug in the asyncmap or the HTTP package. My goal is to test a list of 4881 proxies and find the ones that are working. But asyncmap never seems to finish.

To compare the performance I ran the asyncmap function with two functions. One that has a sleep(5) and one that makes a HTTP.get request with readtimeout=5, retry=false.

The execution time of the function with sleep(5) was 248.891672 seconds. The execution of HTTP.get never ends, it gets slower and slower until it almost stops.

Here is my versioninfo():

Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, haswell)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = auto

Here is my list of proxies:
https://paste.debian.net/plain/1251148

Here is my test code:

using Base.Threads
using HTTP
using CSV
using Tables

const conf = (readtimeout=5, retry=false)
const url = "https://www.google.com/"
const proxies = CSV.File("proxies.csv") |> Tables.matrix |> vec

function testSleep(proxy)
    try
        sleep(5)
        if resp.status == 200
            println(threadid(), " ok   ", proxy)
            return true
        else
            println(threadid(), " bad  ", proxy)
            return false
        end
    catch
        println(threadid(), " fail ", proxy)
        return false
    end
end

function testHTTP(proxy)
    try
        resp = HTTP.get(url, proxy="http://$proxy"; conf...)
        if resp.status == 200
            println(threadid(), " ok   ", proxy)
            return true
        else
            println(threadid(), " bad  ", proxy)
            return false
        end
    catch
        println(threadid(), " fail ", proxy)
        return false
    end
end

@spawn @time asyncmap(testHTTP, proxies)  # never ends
@spawn @time asyncmap(testSleep, proxies) # 248.891672 seconds (19.84 M allocations: 1.270 GiB, 0.23% gc time, 4.33% compilation time: 1% of which was recompilation)
@quinnj
Copy link
Member

quinnj commented Aug 20, 2022

Thanks for the report! Can you confirm the versions of HTTP and MbedTLS packages please?

@lucasmsoares96
Copy link
Author

HTTP v1.2.1
MbedTLS v1.1.3

@quinnj
Copy link
Member

quinnj commented Aug 23, 2022

I've made some progress on this, but haven't gotten it working all the way.

quinnj added a commit that referenced this issue Aug 23, 2022
Fixes #909. Great issue that stretches our current system and exposes
a bunch of places where we weren't accounting for timeouts. The core
issues here are when we have a proxy involved, we had a couple more
places where we were connecting/writing/reading, but weren't applying
a user-provided `connect_timeout` or `readtimeout`. Also for our
ssl upgrade path. In the OP's example, we're hitting thousands of mostly
non-functional proxies, so it's basically a huge stress test on all the
error paths of our request code. This PR proposes:
  * a new `try_with_timeout` function to provide a consistent way to run
    code while waiting for a timeout
  * use the new function in a number of places where we might be stuck waiting

I'm able to get the original script OP provided to finish in ~150 seconds, and
I tried out a multithreaded version that finished in about ~17 seconds w/ 8 threads.
quinnj added a commit that referenced this issue Aug 24, 2022
* Cleanup all timeout code and places where we might timeout

Fixes #909. Great issue that stretches our current system and exposes
a bunch of places where we weren't accounting for timeouts. The core
issues here are when we have a proxy involved, we had a couple more
places where we were connecting/writing/reading, but weren't applying
a user-provided `connect_timeout` or `readtimeout`. Also for our
ssl upgrade path. In the OP's example, we're hitting thousands of mostly
non-functional proxies, so it's basically a huge stress test on all the
error paths of our request code. This PR proposes:
  * a new `try_with_timeout` function to provide a consistent way to run
    code while waiting for a timeout
  * use the new function in a number of places where we might be stuck waiting

I'm able to get the original script OP provided to finish in ~150 seconds, and
I tried out a multithreaded version that finished in about ~17 seconds w/ 8 threads.

* fixes
@lucasmsoares96
Copy link
Author

@quinnj Thanks a lot for the support but the problem persists in version 1.3.0
Now in addition to never ending, from time to time this error is shown:

Error in Timer:
MethodError: no method matching shouldtimeout(::HTTP.ConnectionPool.Connection, ::Int64, ::HTTP.ConnectionRequest.var"#5#11")
Closest candidates are:
  shouldtimeout(::HTTP.ConnectionPool.Connection, ::Any) at ~/.julia/packages/HTTP/GjUWZ/src/ConnectionPool.jl:102
Stacktrace:
 [1] (::HTTP.ConnectionRequest.var"#4#10"{Int64})()
   @ HTTP.ConnectionRequest ~/.julia/packages/HTTP/GjUWZ/src/clientlayers/ConnectionRequest.jl:92
 [2] (::HTTP.Exceptions.var"#5#8"{HTTP.ConnectionRequest.var"#4#10"{Int64}, Int64, HTTP.Exceptions.var"#3#6", Condition})(tm::Timer)
   @ HTTP.Exceptions ~/.julia/packages/HTTP/GjUWZ/src/Exceptions.jl:40
 [3] macro expansion
   @ ./asyncevent.jl:281 [inlined]
 [4] (::Base.var"#665#666"{HTTP.Exceptions.var"#5#8"{HTTP.ConnectionRequest.var"#4#10"{Int64}, Int64, HTTP.Exceptions.var"#3#6", Condition}, Timer})()
   @ Base ./task.jl:134Error in Timer:
MethodError: no method matching shouldtimeout(::HTTP.ConnectionPool.Connection, ::Int64, ::HTTP.ConnectionRequest.var"#5#11")
Closest candidates are:
  shouldtimeout(::HTTP.ConnectionPool.Connection, ::Any) at ~/.julia/packages/HTTP/GjUWZ/src/ConnectionPool.jl:102
Stacktrace:
 2[1] fail  213.6.145.33:4153(::HTTP.ConnectionRequest.var"#4#10"{Int64})
()
   @ HTTP.ConnectionRequest ~/.julia/packages/HTTP/GjUWZ/src/clientlayers/ConnectionRequest.jl:92
 [2] (::HTTP.Exceptions.var"#5#8"{HTTP.ConnectionRequest.var"#4#10"{Int64}, Int64, HTTP.Exceptions.var"#3#6", Condition})(tm::Timer)
   @ HTTP.Exceptions ~/.julia/packages/HTTP/GjUWZ/src/Exceptions.jl:40
 [3] macro expansion
   @ ./asyncevent.jl:281 [inlined]
 [4] (::Base.var"#665#666"{HTTP.Exceptions.var"#5#8"{HTTP.ConnectionRequest.var"#4#10"{Int64}, Int64, HTTP.Exceptions.var"#3#6", Condition}, Timer})()
   @ Base ./task.jl:1342 fail 141.105.86.130:44660

@quinnj
Copy link
Member

quinnj commented Aug 24, 2022

Ah shoot, sorry about that. I was moving the code around a little bit last minute and forgot to update on place, but must have still had the working code loaded in my session that was passing. PR to fix: #911

@quinnj
Copy link
Member

quinnj commented Aug 24, 2022

Ok, the new release (v1.3.1) is out now, so hopefully all is good

@lucasmsoares96
Copy link
Author

Worked perfectly!! Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants