-
Notifications
You must be signed in to change notification settings - Fork 38.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve client and server handling of STOMP heartbeat failures #32195
Comments
I found a scenario although not reproducible
My handleTransportError implementation is called which tries to disconnect a StompSession if it is connected. Unsubscribe and disconnect both fail with java.lang.IllegalStateException: The remote endpoint was in state [TEXT_PARTIAL_WRITING] which is an invalid state for called method
As a consequence of this, the inactivityTasks (read and write) are not cancelled and they keep popping up in the log file I didn't have these issues with spring boot 2.7.4 Not sure if not cancelling these tasks keeps somehow the connection alive (as the server disconnect did not happen) |
Managed to reproduce the "not cancelling the inactivityTasks" from the WebSocketStompClient.WebSocketTcpConnectionHandlerAdapter. With this scenario, the cancelInactivityTasks() method is not called and so I get every 30 seconds (heart-beats are sent every 10 seconds) the exception The stack is this
If you set a breakpoint in the WsSession.checkState() method and manually set the status to CLOSED, you get this exception and the inactivityTasks run further even though the session is closed and the StandardWebSocketSession.sendTextMessage() throws the above exception. Can anyone help here? |
@funnelspider, mixing analysis without fully describing the issue results in a report that is difficult to read and leaves a lot to be unsure about. For future reference, please focus on describing the scenario and the outcome in as much details as you can before moving on to analysis and assumptions. Overall I am uncertain what the actual issue you're reporting is. Whether it is a client issue or a server issue for example, or both. There are however a few things I can answer. I have checked 6.0.9 to 6.0.14 and there is nothing related to this. There could be other upgrades like WebSocket or HTTP client versions, and others including environment changes. "No messages received after..." indicates a WebSocket handshake without a STOMP CONNECT frame within 60 seconds, usually because a proxy does not understand or does not allow WebSocket traffic. If CONNECT and CONNECTED cannot be exchanged, there are no heartbeats and server WebSocket sessions can linger. This is what this check guards against. The log messages are consistent with the SockJS request sequence, which indicates in the cloud it fails to communicate over WebSocket, then tries HTTP streaming (xhr_streaming), and finally falls back on polling (xhr). A proxy is likely precluding WebSocket and streaming. Check what is required for WebSocket and/or streaming in your cloud configuration, or whether those requirements have changed. I also noticed the
We could ensure |
I realize mixing the issue description with my own analysis is not the best approach. I was basically trying to give you information when I was discovering them. Regarding the 'No messages received after' message. Most of them are closed after > 10 minutes. (No messages received after 1122979 ms. Closing WebSocketServerSockJsSession[id=eef41fbe669a4b1a9bb54c429e5d5453) Why not have a scheduled job which clears these after 60 seconds (as the method comment says). Regarding the environment configuration, I was assured by our OPS team that nothing had changed in that regard. The proxyInterceptor is used to set some custom headers in the beforeHandshake method. This is unchanged since 2019. Regarding handleTransportError. I'm not so sure that the connection is closed in all cases because I added a check for isConnected() and it also returns true sometimes. Maybe the WSSession is closed but not the StompSession.
It would be great if Yes, I will add the DEBUG logging for org.springframework.web, org.springframework.http, and org.springframework.messaging to the ticket. I already have for org.springframework.messaging ws.messaging.snippet.log After a while an exception comes (you can't see it in this log but it's the one from above I will come back with logs from the org.springframework.web, org.springframework.http. LATER EDIT
Can you shed some light into what they show ? Especially the sockJsScheduler and if you see something wrong here One more mention here, I've seen on other systems X closed abnormally. (
|
spring.20240210.log |
The idea is to use a natural trigger vs managing yet another scheduled task. Thinking about it again, it might make more sense to check when new connections are established instead of when messages arrive. I'll make a change along those lines.
Nevertheless, the "No server messages received..." indicates clearly WebSocket connectivity issues. WebSocket connections are established, but messages do not start flowing, and STOMP negotiation fails to complete. Maybe this issue happens for some clients only and the proxy issue is on their side if they are external, or perhaps there was some other less obvious or indirect environment or network change (some upgrade somewhere).
That's what I expect, correct. At the level of
The SockJS protocol requires the server to send heartbeat messages, by default every 25 seconds but it's configurable. When STOMP is used on top, we disable those from I've looked at the logs. I was hoping they would help me to see if there is any issue with SockJS transport fallbacks and the close notification, but I did not see any evidence of falling back such as requests to "/xhr_streaming" or "/xhr" as you had indicated earlier, and likewise no |
Thank you for your answer and insight From the log that I've attached spring.20240210.log it is visible that the session is closed at 03:51:43
Afterwards, a new websocket connection is established at 03.51.49 (6 seconds later)
and afterwards the old session is sending exceptions So maybe two things can be done here,
|
"Sending DISCONNECT" is not present in the log file you provided. I do see the following:
This indicates something called the |
It makes more sense to call this from afterConnectionEstablished as it relates to the creation of new sessions. See gh-32195
You are right. The logs I was mentioning are on level TRACE. |
Okay, so for this:
Any idea what triggers the unubscribe and disconnect? If it is to match the scenarios described in #32195 (comment), then there should be an |
That scenario I could only reproduce in debug mode. (by manually setting the state to CLOSED I will leave the DEBUG logs in the application and if I see it again I'll check the logs. |
ConnectionLostException was applies only after the WebSocket library notified us of a session close. However, read inactivity and heartbeat send failures are also cases of the connection being lost rather than closed intentionally. This commit also ensures resetConnection is called after a heartbeat write failure, consistent with other places where a transport error is handled that implies the connection is lost. See gh-32195
I've made improvements on the client side as well that should better ensure the cancellation of heartbeat tasks. If you could give |
On the client side I'm using spring boot version 3.2.0. Can I get a test version for this ? |
You don't need a whole new Spring Boot release. You can update just the dependency version for Spring Framework, e.g. for Gradle. For Maven it's just setting a similarly named property "spring-framework.version". |
I have a spring framework application and a swing application which communicates with the spring app (tomcat) via websockets.
Locally I cannot reproduce this issue, but in a cloud environment it is reproducible.
The swing application has an auto-reconnect job which checks if the websocket is disconnected, and if it is it reconnects.
My assumption is that when the swing client disconnects (because it doesn't receive heartbeats from the server), the server does not notice this and keeps the sessions alive.
The log in tomcat looks like this
And client connects like this:
Is there a way to check on the client that there is a websocket connection active or not ?
I also found this comment where the log above comes from:
What exactly is a 'session that is to slow to send or receive messages' ?
How would i be able to simulate this?
It's worth mentioning that when the websocket request in the CLOUD is made the following http request are performed:
Locally I only get
I'm using spring framework 6.0.14
Did not have this many issues with 6.0.8
Did something change between the 2 versions?
Thanks
The text was updated successfully, but these errors were encountered: