-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fix deadlock #124
Try to fix deadlock #124
Conversation
This call: nb_available(::SSLContext)->mbedtls_ssl_read->f_recv->readbytes!() can block holding the datalock and can stop calls write(::SSLContext) calls (in other tasks) from getting the lock and writing. If a request is not writen, the response may never come, and the read may block forever. The forum post below says that the zero-size mbedtls_ssl_read call used to find out how many bytes are available must be done in non-blocking mode. This commit addes a nonblocking flag to try to implement that. This seems to work to avoid a deadlock in HTTP.jl where: - A request header has been written. - An async task has been schedueled to write the body. - The main task starts reading the response (in some cases the response headers arrive before the request body is sent). The expected behaviour is that the reading task will soon block to wait for data, thereby yeilding to the async task, which will finish writing the body. Before this commit, it would get stuck. See: https://esp32.com/viewtopic.php?t=1101#p4945 "After the correct implementation of non-blocking sockets, the things seem to be working fine"
- Add a txcount field to the Request struct - Increment txcount after the body is written - Modify RetryRequest.isrecoverable to allow retry of non-idempotent requests if txcount is zero. - Modify StreamRequest to Wait for other pipelined reads to complete before sending a non-idempotent request body. This should decrease the chance of sending a POST body that ends up failing and can't be retried. - Remove yeild() after @async writebody. This should increase the chance that the startread() realises that the connection is dead before the body is written. Depends on: JuliaLang/MbedTLS.jl#124
Codecov Report
@@ Coverage Diff @@
## master #124 +/- ##
==========================================
- Coverage 68.63% 64.88% -3.75%
==========================================
Files 10 10
Lines 424 430 +6
==========================================
- Hits 291 279 -12
- Misses 133 151 +18
Continue to review full report at Codecov.
|
Is that just in |
ccall((:mbedtls_ssl_read, MBED_TLS), | ||
Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), ctx.data, C_NULL, 0) | ||
ctx.nonblocking = true | ||
try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this try-finally really needed? Can ccall
s actually throw errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure.
But it's never good for a block of code to appear to restore the old state, but have a lurking possibility of exiting early.
(& It isn't a try/catch, its a try/finally)
I'm happy to leave a FIXME in the code suggesting that someone could take it out in the future if everything is working nicely...
No, it is quite common for the server to immediately respond with error response headers if it can tell it isn't happy based on the request headers. e.g. unsupported request, or bad credentials. |
In fact there is a specific part in the RFC about how if the server says Connection: Close you must close before sending the remainder of the body. There is a test for this in test/loopback.jl |
@@ -112,19 +114,23 @@ end | |||
function f_send(c_ctx, c_msg, sz) | |||
jl_ctx = unsafe_pointer_to_objref(c_ctx) | |||
jl_msg = unsafe_wrap(Array, c_msg, sz) | |||
return Cint(write(jl_ctx, jl_msg)) | |||
return Cint(write(jl_ctx.bio, jl_msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? we're calling write
on the TCPSocket now instead of on the SSLContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the void* passed to the library was the socket.
Now it is the SSLContext. This is needed so we can see the nonblocking flag in the callback.
So, the .bio
is needed now to get back to the socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense. 👍
return Cint(n) | ||
end | ||
|
||
function set_bio!(ssl_ctx::SSLContext, jl_ctx::T) where {T<:IO} | ||
ssl_ctx.bio = jl_ctx | ||
set_bio!(ssl_ctx, pointer_from_objref(jl_ctx), c_send[], c_recv[]) | ||
set_bio!(ssl_ctx, pointer_from_objref(ssl_ctx), c_send[], c_recv[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what effect does this change end up having?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
Looks good. Thanks @samoconnor ! |
- Add a txcount field to the Request struct - Increment txcount after the body is written - Modify RetryRequest.isrecoverable to allow retry of non-idempotent requests if txcount is zero. - Modify StreamRequest to Wait for other pipelined reads to complete before sending a non-idempotent request body. This should decrease the chance of sending a POST body that ends up failing and can't be retried. - Remove yeild() after @async writebody. This should increase the chance that the startread() realises that the connection is dead before the body is written. Depends on: JuliaLang/MbedTLS.jl#124
This call:
nb_available(::SSLContext)->mbedtls_ssl_read->f_recv->readbytes!()
can block holding the datalock and can stop calls write(::SSLContext) calls
(in other tasks) from getting the lock and writing. If a request is not
writen, the response may never come, and the read may block forever.
The forum post below says that the zero-size mbedtls_ssl_read call
used to find out how many bytes are available must be done in
non-blocking mode. This commit addes a nonblocking flag to try to
implement that.
This seems to work to avoid a deadlock in HTTP.jl where:
response headers arrive before the request body is sent).
The expected behaviour is that the reading task will soon block to wait
for data, thereby yeilding to the async task, which will finish writing
the body. Before this commit, it would get stuck.
See: https://esp32.com/viewtopic.php?t=1101#p4945
"After the correct implementation of non-blocking sockets, the things
seem to be working fine"