-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ConnectionPool connect_timeout and idle_timeout #215
Conversation
src/ConnectionPool.jl
Outdated
isdeletable(c) = (!isopen(c.io) || (c.idle_timeout > 0 && | ||
!c.readbusy && | ||
!c.writebusy && | ||
time()-c.timestamp > c.idle_timeout)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to call close(c.io) here!
Do you think we could somehow keep our own list of known host timeouts and use those defaults per-host? I'm just thinking about the excellent sleuthing you did in #214, like, we could have a |
src/ConnectionPool.jl
Outdated
# FIXME this currently causes the Julia runtime to hang on exit... | ||
result = Ref{TCPSocket}() | ||
error = Ref{Any}() | ||
@schedule try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash I'd like your opinion on this.
Should we include a best-attempt connection timeout feature. Or should we just say for now that underlying infrastructure currently makes this hard to implement, and that users should do their own @async timeout thing if they really need a connect timeout?
JuliaCloud/AWSCore.jl#24 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems reasonable to add, I just recommended using close
to do it "properly" (I don't know why the libuv developers claimed in the linked issue that you couldn't cause ECANCELED, since I clearly can):
julia> s = Sockets.TCPSocket()
Main.Sockets.TCPSocket(RawFD(0xffffffff) init, 0 bytes waiting)
julia> Sockets.connect!(s, Sockets.ip"192.168.101.1", 80)
julia> @async @show Sockets.wait_connected(s)
julia> close(s)
julia>
ERROR (unhandled task failure): connect: operation canceled (ECANCELED)
Around this, you just need to add a bit of logic to coordinate closing the socket if the timeout was reached and the socket is still s.state == StatusConnecting
. And possible also to loop until the timeout is reached (as mentioned in the linked issue, the kernel has its own idea of what the default timeout should be), if you think 2 minutes is too short.
This also uncovers a minor bug in Base, since this assert is wrong:
diff --git a/stdlib/Sockets/src/Sockets.jl b/stdlib/Sockets/src/Sockets.jl
index ec7876a917..e947da8fd4 100644
--- a/stdlib/Sockets/src/Sockets.jl
+++ b/stdlib/Sockets/src/Sockets.jl
@@ -371,12 +371,13 @@ end
function uv_connectcb(conn::Ptr{Cvoid}, status::Cint)
hand = ccall(:jl_uv_connect_handle, Ptr{Cvoid}, (Ptr{Cvoid},), conn)
sock = @handle_as hand LibuvStream
- @assert sock.status == StatusConnecting
if status >= 0
- sock.status = StatusOpen
+ if !(sock.status == StatusClosed || sock.status == StatusClosing)
+ sock.status = StatusOpen
+ end
notify(sock.connectnotify)
else
- sock.status = StatusInit
+ ccall(:jl_forceclose_uv, Cvoid, (Ptr{Cvoid},), hand)
err = UVError("connect", status)
notify_error(sock.connectnotify, err)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See new commit.
When I tried to use just close(tcp)
the close call didn't return. If I did @schedule close(tcp)
the REPL hung on exit. So, for now I'm calling ccall(:jl_forceclose_uv
, which seems to work.
Is is worth making a PR for stdlib/Socket for this? (It seems like it might be easier to make it robust without being confined to the official public interface) or should it stay in HTTP.jl for now?
Doesn't feel right to me. What I've read about TCP level keepalive is suggests that in practice many CDNs, ISP NATs, firewalls, proxies etc either don't forward keepalives, or don't count them as activity for timeout purposes. The general idea of a metadata directory for HTTP endpoints is an interesting one. I don't think building the database into the Julia HTTP library is right, but I can imagine a web service where you do |
Codecov Report
@@ Coverage Diff @@
## master #215 +/- ##
=========================================
Coverage ? 72.33%
=========================================
Files ? 35
Lines ? 1836
Branches ? 0
=========================================
Hits ? 1328
Misses ? 508
Partials ? 0
Continue to review full report at Codecov.
|
src/ConnectionPool.jl
Outdated
yield() | ||
|
||
# sum(delays) ~= connect_timeout | ||
delays = ExponentialBackOff(n=connect_timeout + 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you busy-polling this? We have a callback for it (wait_connected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see how to simultaneously wait for connection success and a timeout.
I can call wait_connected in a background task, and notify a condition when it is done, but there is no timeout parameter wait(::Condition). Or I could wait for the task to be done, but fetch(::Task)
has no timeout parameter.
Are you thinking along the lines of wait_connected
in the main task and a background task that calls close after the timeout to wake up the wait_connected
with an error? I used that pattern over here: https://github.com/JuliaWeb/HTTP.jl/blob/master/src/TimeoutRequest.jl#L20
Don't know why it didn't occur to me in this case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you answered your own question...yes.
src/ConnectionPool.jl
Outdated
if tcp.status == Base.StatusConnecting | ||
timeout[] = true | ||
ccall(:jl_forceclose_uv, Void, (Ptr{Void},), tcp.handle) | ||
#close(tcp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash close didn't work here (0.6.2).
But I assume that's some small detail of handling status flags in Base.close because closing while opening is not a common operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe fixed on master? This is right, but should also set the state to StatusClosing
Base.wait_connected(tcp) | ||
catch e | ||
if timeout[] | ||
throw(ConnectTimeout(host, port)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to make sure it throws this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't think UVError
should ever be thrown from a user-level API unless it means there is a bug. See JuliaLang/julia#15879 etc.
I think application level code will want to handle "I set a short timeout and it wasn't met" differently from "something bad happened deep in the network stack.
Elsewhere in HTTP.jl we wrap things like UVError
and ArgumentError "stream is closed or unusable"
in an IOError
type:
https://github.com/JuliaWeb/HTTP.jl/blob/master/src/ConnectionRequest.jl#L36
In higher level code it is important to handle some specific errors by retrying but make sure everything else finds its way up to create a crash and a stack trace and an issue report and a fix. e.g. https://github.com/JuliaCloud/AWSCore.jl/blob/master/src/http.jl#L47-L51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArgumentError "stream is closed or unusable"
seems pretty wrong because it looks like a precondition error, but there is now way to check the precondition before calling the API because the stream can be isopen()
before the call but be found to be "closed or unusable" while the call is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vtjnash, I just noticed that poll_fd
has a timeout_s
option. Would that work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that's relevant here, since you don't have a fd
. But yes, that does show another pattern that can be used to implement timeouts (with some more difficulty ... in fact, you've just inspired me to rewrite that file better ... you may see a new PR to base tomorrow sometime 🙂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming that I could get the fd
out of the socket with uv_fileno
or something like that.
Is there a gotcha along the lines of windows socket handles won't work with poll_fd
?
new PR
👍🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) the socket isn't open, so you can't poll it (b) it's invalid to pass libuv-derived fd handles to libuv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for taking the time to explain :)
src/ConnectionPool.jl
Outdated
if tcp.status == Base.StatusConnecting | ||
timeout[] = true | ||
ccall(:jl_forceclose_uv, Void, (Ptr{Void},), tcp.handle) | ||
#close(tcp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe fixed on master? This is right, but should also set the state to StatusClosing
@quinnj would you mind doing a quick review of this and the merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Sorry it took me a while to review.
No problem. And thanks! |
No description provided.