SslConnection.DecryptedEnpoint.flush eternal busy loop #4217

baldersheim · 2019-10-17T15:21:07Z

This morning 1 of our servers was caught redhanded with a jetty thread in a very tight busy loop in org.eclipse.jetty.io.ssl.SslConnection.DecryptedEnpoint.flush.

OS - RHEL 6.10 docker container running on a RHEL 7.7
JVM:
$ java -version
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)
Jetty: 9.4.20.v20190813

Here is an exploded view of the interesting callgraph taken with the 'perf' tool on linux.

16.10% 2.37% ... Lorg/eclipse/jetty/io/ssl/SslConnection$DecryptedEndPoint;.flush
85.31% Lorg/eclipse/jetty/io/ssl/SslConnection$DecryptedEndPoint;.flush
- 98.10% Lsun/security/ssl/SSLEngineImpl;.wrap
  - 98.74% Lsun/security/ssl/SSLEngineImpl;.writeRecord
    - 78.41% _complete_monitor_locking_Java
    - 15.99% Lsun/security/ssl/TransportContext;.getHandshakeStatus
    - 2.44% 0xffffffff9658defa
    - 2.25% SharedRuntime::complete_monitor_unlocking_C
  - 1.12% Lsun/security/ssl/TransportContext;.getHandshakeStatus
- 1.22% Lsun/security/ssl/TransportContext;.getHandshakeStatus
  0.55% Lsun/security/ssl/SSLEngineImpl;.writeRecord

We have taken a heap dump and analysed it with the code and have observed the following.
1 - The connection in handshake ->org.eclipse.jetty.io.ssl.SslConnection$Handshake = INITIAL
2 - _encryptedOutput buffer is empty with a capacity of 17408.
3 - _sslEngine.conContext.conSession.acceptLargeFragments = true

These values will lead to
891 wrapResult = _sslEngine.wrap(appOuts, _encryptedOutput);
returning a status of BUFFER_OVERFLOW if _encryptedOutput does not have at least room for getApplicationBufferSize().
The reason is that getPacketBufferSize() will return 33093 when acceptLargeFragments is true, as opposed to 16709 normally. Unless maxPacketSize has been set or it has completed negotiation and reached a maxNegotiatedPacketSize, but neither were set in this heap dump.

This means that if you have ended up in this state with a too small empty _encryptedOutput buffer you will have a tight loop with
852 while true {
......
931 case BUFFER_OVERFLOW:
932 if (!flushed)
933 return result = false;
934 -------> continue;

Since flushed is true, due to _encryptedOutput is empty you have the tight loop sealed with the continue statement on line 934.

The clue here is that SSLEngine.getSession().getPacketBufferSize() might change during the handshake phase of the connection.

We did the same mistake in our own use of the SSLEngine and discovered that the getPacketBufferSize() could change during the handshake phase.

We suggest adding a call to releaseEncryptedOutputBuffer() before the 'continue' statement on line 934. That should trigger the generation of a new buffer with the correct size.

I hope that there is enough information here for you to figure out how it could end up in this state, and how it can be reproduced. My knowledge of jetty and ssl is limited and since this has just been observed once I am unsure how to bring it back in this state.

Feel free to ask for more details. I hope that you have something to go by. I have a heap dump and a server in bad state, but the server will be restarted and put back in operation in soon.

sbordet · 2019-10-17T15:43:38Z

Thanks for the detailed analysis. I'm looking into this.

sbordet · 2019-10-17T17:27:57Z

@baldersheim I cannot see how to test the real case, where a TLS read causes the buffer expansion, and a subsequent wrap() causes BUFFER_OVERFLOW.

However, I have a test that verifies that we release the buffer in case of BUFFER_OVERFLOW as you suggest.

Have you been able to test your suggested solution and verified that it works?

Out of curiosity, what is the client that uses large TLS fragments?

sbordet · 2019-10-17T17:43:35Z

PR #4218.

@baldersheim can you test the fix (build the PR branch and try it out)?

baldersheim · 2019-10-17T19:02:38Z

Yes, I will do that. But probably not until Monday.

The think the client in this case is an nginx server running as a proxy. We recently rebuilt it to use open ssl 1.1.1 instead of the older 1.0.1e. I wonder if that might be part of the equation.
Will keep you posted with our findings.

Releasing the decrypted input buffer so that it can be re-acquired with an expanded capacity. Looping around only if the buffer size has changed. Signed-off-by: Simone Bordet <[email protected]>

Updates after review. Added test case. Signed-off-by: Simone Bordet <[email protected]>

…er_overflow_busy_loop Fixes #4217 - SslConnection.DecryptedEnpoint.flush eternal busy loop.

Signed-off-by: Joakim Erdfelt <[email protected]>

+ Small cleanup of duplicate if statements Signed-off-by: Joakim Erdfelt <[email protected]>

joakime · 2019-10-29T19:21:33Z

@baldersheim were you able to confirm that it was nginx compiled against openssl 1.1.1 that was the source of the problematic ssl/tls client?

baldersheim · 2019-10-29T23:38:04Z

No, source has not been found yet. I was not able to reproduce in our production system.
The suspect is that this was our internal paranoids team doing port scans and testing various vulnerabilities. That is a background activity outside of our control that runs very frequently. They have caused issue earlier on protocols that were not safe, but after enabling TLS everywhere this is the first incident.

joakime · 2019-10-29T23:52:19Z

@baldersheim out of curiosity, since you mentioned a custom nginx, is your nginx compiled with the dynamic TLS record patch from cloudflare?

https://blog.cloudflare.com/optimizing-tls-over-tcp-to-reduce-latency/

If you have a configuration in your nginx for options like ssl_dyn_rec_size_hi then you probably have it compiled in.

+ Cleanup from review Signed-off-by: Joakim Erdfelt <[email protected]>

Signed-off-by: Joakim Erdfelt <[email protected]>

+ Flush on BUFFER_OVERFLOW Signed-off-by: Joakim Erdfelt <[email protected]>

Signed-off-by: Joakim Erdfelt <[email protected]>

Issue #4217 - (9.2.x) SslConnection DecryptedEndpoint flush eternal busy loop

…flush-loop Issue #4217 - (9.3.x) SslConnection DecryptedEndpoint flush eternal busy loop

joakime · 2019-11-06T14:51:31Z

This fix has been backported to jetty-9.2.x and jetty-9.3.x as well.
Releases are planned for both of those versions of Jetty.

sbordet self-assigned this Oct 17, 2019

sbordet added the High Priority label Oct 17, 2019

sbordet mentioned this issue Oct 18, 2019

Fixes #4217 - SslConnection.DecryptedEnpoint.flush eternal busy loop. #4218

Merged

sbordet added a commit that referenced this issue Oct 19, 2019

Issue #4217 - SslConnection.DecryptedEndpoint.flush eternal busy loop.

73eb82c

Updates after review. Added test case. Signed-off-by: Simone Bordet <[email protected]>

sbordet closed this as completed in 2e633a4 Oct 21, 2019

sbordet added a commit that referenced this issue Oct 21, 2019

Merge pull request #4218 from eclipse/jetty-9.4.x-4217-tls_flush_buff…

add8ffc

…er_overflow_busy_loop Fixes #4217 - SslConnection.DecryptedEnpoint.flush eternal busy loop.

joakime added a commit that referenced this issue Oct 24, 2019

Issue #4217 - SslConnection DecryptedEndpoint flush eternal busy loop

e665c8f

Signed-off-by: Joakim Erdfelt <[email protected]>

joakime mentioned this issue Oct 24, 2019

Issue #4217 - (9.3.x) SslConnection DecryptedEndpoint flush eternal busy loop #4245

Merged

joakime added a commit that referenced this issue Oct 25, 2019

Issue #4217 - SslConnection DecryptedEndpoint flush eternal busy loop

93a5ff0

+ Small cleanup of duplicate if statements Signed-off-by: Joakim Erdfelt <[email protected]>

joakime added a commit that referenced this issue Oct 30, 2019

Issue #4217 - SslConnection DecryptedEndpoint flush eternal busy loop

e137ee3

+ Cleanup from review Signed-off-by: Joakim Erdfelt <[email protected]>

joakime added a commit that referenced this issue Oct 30, 2019

Issue #4217 - Allowing Large TLS Records in Java 11+ in Jetty 9.2.x

737144a

Signed-off-by: Joakim Erdfelt <[email protected]>

joakime mentioned this issue Oct 30, 2019

Issue #4217 - (9.2.x) SslConnection DecryptedEndpoint flush eternal busy loop #4256

Merged

joakime added a commit that referenced this issue Oct 30, 2019

Issue #4217 - SslConnection DecryptedEndpoint flush eternal busy loop

c04997f

+ Flush on BUFFER_OVERFLOW Signed-off-by: Joakim Erdfelt <[email protected]>

joakime added a commit that referenced this issue Oct 30, 2019

Issue #4217 - Fixing OVERFLOW case to allow flush()

f7ab02a

Signed-off-by: Joakim Erdfelt <[email protected]>

joakime added a commit that referenced this issue Oct 30, 2019

Issue #4217 - Reverting debug output in IllegalStateException

a3f3612

Signed-off-by: Joakim Erdfelt <[email protected]>

joakime added a commit that referenced this issue Oct 31, 2019

Merge pull request #4256 from eclipse/jetty-9.2.x-tls-large-record

3722266

Issue #4217 - (9.2.x) SslConnection DecryptedEndpoint flush eternal busy loop

joakime added a commit that referenced this issue Oct 31, 2019

Merge pull request #4245 from eclipse/jetty-9.3.x-4217-sslconnection-…

6c69c39

…flush-loop Issue #4217 - (9.3.x) SslConnection DecryptedEndpoint flush eternal busy loop

rushalias mentioned this issue Oct 22, 2020

Possible stuck thread in Jetty #3645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SslConnection.DecryptedEnpoint.flush eternal busy loop #4217

SslConnection.DecryptedEnpoint.flush eternal busy loop #4217

baldersheim commented Oct 17, 2019

sbordet commented Oct 17, 2019

sbordet commented Oct 17, 2019

sbordet commented Oct 17, 2019

baldersheim commented Oct 17, 2019

joakime commented Oct 29, 2019

baldersheim commented Oct 29, 2019

joakime commented Oct 29, 2019

joakime commented Nov 6, 2019

SslConnection.DecryptedEnpoint.flush eternal busy loop #4217

SslConnection.DecryptedEnpoint.flush eternal busy loop #4217

Comments

baldersheim commented Oct 17, 2019

sbordet commented Oct 17, 2019

sbordet commented Oct 17, 2019

sbordet commented Oct 17, 2019

baldersheim commented Oct 17, 2019

joakime commented Oct 29, 2019

baldersheim commented Oct 29, 2019

joakime commented Oct 29, 2019

joakime commented Nov 6, 2019