-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SslConnection.DecryptedEnpoint.flush eternal busy loop #4217
Comments
Thanks for the detailed analysis. I'm looking into this. |
@baldersheim I cannot see how to test the real case, where a TLS read causes the buffer expansion, and a subsequent However, I have a test that verifies that we release the buffer in case of Have you been able to test your suggested solution and verified that it works? Out of curiosity, what is the client that uses large TLS fragments? |
PR #4218. @baldersheim can you test the fix (build the PR branch and try it out)? |
Yes, I will do that. But probably not until Monday. The think the client in this case is an nginx server running as a proxy. We recently rebuilt it to use open ssl 1.1.1 instead of the older 1.0.1e. I wonder if that might be part of the equation. |
Releasing the decrypted input buffer so that it can be re-acquired with an expanded capacity. Looping around only if the buffer size has changed. Signed-off-by: Simone Bordet <[email protected]>
Updates after review. Added test case. Signed-off-by: Simone Bordet <[email protected]>
…er_overflow_busy_loop Fixes #4217 - SslConnection.DecryptedEnpoint.flush eternal busy loop.
Signed-off-by: Joakim Erdfelt <[email protected]>
+ Small cleanup of duplicate if statements Signed-off-by: Joakim Erdfelt <[email protected]>
@baldersheim were you able to confirm that it was nginx compiled against openssl 1.1.1 that was the source of the problematic ssl/tls client? |
No, source has not been found yet. I was not able to reproduce in our production system. |
@baldersheim out of curiosity, since you mentioned a custom nginx, is your nginx compiled with the dynamic TLS record patch from cloudflare? https://blog.cloudflare.com/optimizing-tls-over-tcp-to-reduce-latency/ If you have a configuration in your nginx for options like |
+ Cleanup from review Signed-off-by: Joakim Erdfelt <[email protected]>
Signed-off-by: Joakim Erdfelt <[email protected]>
+ Flush on BUFFER_OVERFLOW Signed-off-by: Joakim Erdfelt <[email protected]>
Signed-off-by: Joakim Erdfelt <[email protected]>
Signed-off-by: Joakim Erdfelt <[email protected]>
Issue #4217 - (9.2.x) SslConnection DecryptedEndpoint flush eternal busy loop
…flush-loop Issue #4217 - (9.3.x) SslConnection DecryptedEndpoint flush eternal busy loop
This fix has been backported to |
This morning 1 of our servers was caught redhanded with a jetty thread in a very tight busy loop in org.eclipse.jetty.io.ssl.SslConnection.DecryptedEnpoint.flush.
OS - RHEL 6.10 docker container running on a RHEL 7.7
JVM:
$ java -version
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)
Jetty: 9.4.20.v20190813
Here is an exploded view of the interesting callgraph taken with the 'perf' tool on linux.
0.55% Lsun/security/ssl/SSLEngineImpl;.writeRecord
We have taken a heap dump and analysed it with the code and have observed the following.
1 - The connection in handshake ->org.eclipse.jetty.io.ssl.SslConnection$Handshake = INITIAL
2 - _encryptedOutput buffer is empty with a capacity of 17408.
3 - _sslEngine.conContext.conSession.acceptLargeFragments = true
These values will lead to
891 wrapResult = _sslEngine.wrap(appOuts, _encryptedOutput);
returning a status of BUFFER_OVERFLOW if _encryptedOutput does not have at least room for getApplicationBufferSize().
The reason is that getPacketBufferSize() will return 33093 when acceptLargeFragments is true, as opposed to 16709 normally. Unless maxPacketSize has been set or it has completed negotiation and reached a maxNegotiatedPacketSize, but neither were set in this heap dump.
This means that if you have ended up in this state with a too small empty _encryptedOutput buffer you will have a tight loop with
852 while true {
......
931 case BUFFER_OVERFLOW:
932 if (!flushed)
933 return result = false;
934 -------> continue;
Since flushed is true, due to _encryptedOutput is empty you have the tight loop sealed with the continue statement on line 934.
The clue here is that SSLEngine.getSession().getPacketBufferSize() might change during the handshake phase of the connection.
We did the same mistake in our own use of the SSLEngine and discovered that the getPacketBufferSize() could change during the handshake phase.
We suggest adding a call to releaseEncryptedOutputBuffer() before the 'continue' statement on line 934. That should trigger the generation of a new buffer with the correct size.
I hope that there is enough information here for you to figure out how it could end up in this state, and how it can be reproduced. My knowledge of jetty and ssl is limited and since this has just been observed once I am unsure how to bring it back in this state.
Feel free to ask for more details. I hope that you have something to go by. I have a heap dump and a server in bad state, but the server will be restarted and put back in operation in soon.
The text was updated successfully, but these errors were encountered: