-
Notifications
You must be signed in to change notification settings - Fork 38.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve read publisher cancel handling to avoid connections in CLOSE_WAIT state with WebSocket on Tomcat #30393
Comments
That seems to be a pretty low level issue, have you identified anything that let you think this issue is on Spring side and not on Tomcat side? |
Just that Spring's Specifically this
TBH, I'm not an expert at the interaction between Spring and Tomcat. I'd love if someone more familiar with it could take a look at the example project and the reproduction steps I provided. |
This is related to the resuming and suspending of the Tomcat WebSocket session, and not reading the final close frame (I think) if the session is closed while in the suspended state. The session is closed while handling an incoming message. Tomcat calls
I can make changes so that cancel handling is similar to completion handling when in the That said I am also wondering if Tomcat should be handling this differently, i.e. when the session is closed should the session be resumed to allow the closing to be completed. @markt-asf any thoughts on that? |
Thanks as always @rstoyanchev ! |
Late to the party but I'll be looking at this today on the Tomcat side. I was on the fence as to whether whatever called Regardless, it looks like Tomcat needs to (better) handle the case where the WebSocket session is closed by the server but the client confirmation of the close is never received. That is what I am planning on looking at. |
There are a couple cases where using WebSockets with WebFlux on Tomcat can leave connections in a CLOSE_WAIT state after closing the websocket session. These connections stick around, and will eventually cause tomcat to reach its connection limit (if set). This prevents tomcat from accepting new connections, and thus leads to the server becoming unresponsive (except for previously established connections)
When running the same test cases with WebFlux on Netty or Undertow, the connections are closed properly.
I have provided an example project (ws-close-waiting.zip) that shows the cases where the connection gets stuck in CLOSE_WAIT on tomcat after the websocket session is closed.
The project has three websocket endpoints, each showing a different case (only 2 cases fail). In each case, the server will close the websocket session (but in different ways) after receiving a message from the client.
/closeZip
- Callssession.close(...)
while processing the input stream. The input/output stream are merged with thezip
operator. This case leaves the connection in CLOSE_WAIT on tomcat./closeZipDelayError
- Callssession.close(...)
while processing the input stream. The input/output stream are merged with thezipDelayError
operator. This case properly closes the connection. I included this case for comparison with the first case. I'm not sure what the downsides of usingzipDelayError
would be though. Advice appreciated./exceptionZipDelayError
- Propagates an exception on the input stream, but handles that exception withonErrorResume
by callingsession.close(...)
. The input/output streams are merged with thezipDelayError
operator. This case leaves the connection in CLOSE_WAIT on tomcat. I included this case to show that thezipWithError
operator will "fix" some cases (2), but not every case.I have enabled the following logging:
In the failing cases (1 and 3), the read publisher logs a cancel message, and I see the following log lines:
In the successful case (2), the read publisher does not log a cancel message. I think the cancelling is the underlying problem. It prevents the server from noticing that the client has closed the connection.
To test each use case, I used netstat to observe connections, and websocat as the websocket client. Specifically...
I started netstat in a loop to observe connections every second...
Then I used websocat in another terminal as follows:
e.g.
websocat -v -v ws://localhost:8080/closeZip
(orcloseZipNoDelay
orexceptionZipNoDelay
)netstat will show something like...
For the successful cases, the connections will disappear from netstat.
For the failure cases, netstat will show something like...
Again, when running WebFlux on Netty or Undertow, the connections always go away in all three cases.
The text was updated successfully, but these errors were encountered: