-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shouldDrainReadBuffer followed up by readDisable(true) and readDisable(false) strange behaviour #12304
Comments
I think I see the bug in ConnectionImpl::readDisable(false) the " && read_buffer_.length() > 0" should be removed from the resumption condition. |
But this should be an older bug, the issue is that with SSL we can't assume that resetting the fd mask by calling file_event_->setEnabled(Event::FileReadyType::Read | Event::FileReadyType::Write); will result in correct resumption in all cases. There may be bytes to read in SSL's internal buffers |
Commit 77cca6b is not in v1.14.4, so this should be the older bug. |
/assign @antoniovicente |
/backport |
Thinking about this one some more, I think that we could consider this being a bug in SslSocket::doRead instead of ConnectionImpl::disableRead. SslSocket::doRead does not check for additional bytes in SSL_read internal buffers by calling SSL_pending and doing additional SSL_read calls after callbacks_->shouldDrainReadBuffer() returns true. The consequence is that the SSL connection is left in a state where it is able to generate additional bytes during a future SslSocket::doRead call even if the underlying fd's read buffer is fully drained. The other 2 publicly available implementations of TransportSocket (e.i. raw sockets and ALTS) have the property that doRead can't make progress without additional read bytes from the socket. Fixing SslSocket::doRead should be straightforward. It is less clear if readDisable(false) should activate(Read) in cases where the read buffer is empty as a way to simplify the transport socket contract. I don't know what non-OSS, private transport socket implementations exist. |
I cherry-picked commit 77cca6b into our repo of 1.14.4 and it failed to fix the issue. But after removing this condition:
as suggested by @antoniovicente it seems to be working for now. I'll test this a bit more. Just wanted to add some details from our testing so far. |
Thanks for confirming that more eagerly calling setReadBufferReady(); helps. I'm going to try to continue working on a fix, with focus on making the SslSocket drain internal buffers on read so that resumption based on fd re-registration works correctly. |
Sent out a PR with tests that repo the timeout while processing the last SSL record issue and changes to SslSocket to ensure that that bytes in internal buffers are drained out before returning from doRead. Sorry for the delays, too many other things to pay attention to. Hopefully this fix will be merged in time for 1.16, and backported to earlier releases if appropriate. |
…on requests and replay them when re-enabling read. (#13772) (#14017) Fixes SslSocket read resumption after readDisable when processing the SSL record that contains the last bytes of the HTTP message Risk Level: low Testing: new unit and integration tests Docs Changes: n/a Release Notes: added Platform Specific Features: n/a Fixes #12304 Signed-off-by: Antonio Vicente <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]>
Backported #13772 to rels 1.16, 1.15, 1.14 and 1.13. Removing backport/review label. |
The behaviour I'm about to describe requires quite a lot of setup in order to reproduce.
To begin with I'm using v1.14.4.
My listeners use the following configuration.
The default value of 1MB just seemed way too big and opened up lots of opportunities for oom so I ended up using 16k.
Then I have a local SSL client in which data in sent in chunks, e.g. 400 followed up by 16500 bytes (e.g. HTTP headers and body).
What can be seen in the trace logs is the following:
At that point envoy stops reading from the downstream and either the client or the server times out. According to the client the entire request has been sent and according to the server not the entire request was received.
To me it seems like the problem is that whenever we have this situation
only then do we execute the following code:
and in my case I have shouldDrainReadBuffer() == true because
and then setReadBufferReady does
which seems to immediately get eaten by the first readDisable(true)
and when the second readDisable(false) happens
the read event is not activated.
If readDisable(true/false) isn't triggered or the setReadBufferReady() code path is not executed all is well.
I tried plain TCP and wasn't able to reproduce the issue (there's a similar if shouldDrainReadBuffer in it).
I wasn't able to reproduce the issue with an external connection either (maybe my network is slow).
The code snippets that I shared may be a few lines off v1.14.4 as I have a few ERR_clear_error() calls added as I'm using OpenSSL but looking at the behaviour I would not blame them or OpenSSL.
I can't unfortunately share either the client or the server code.
I can reliable reproduce the issue once every 5 minutes or so, so if anyone has a patch in mind I'll be willing to try it out. In the meantime, I'll see if I can come up with something.
The text was updated successfully, but these errors were encountered: