-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low traffic or publish-only connections are closed with "too many missed heartbeats" #106
Comments
We're also seeing the client generated exception: |
According to https://www.rabbitmq.com/heartbeats.html#heartbeats-interval any incoming message (heartbeat or not) from the broker can be treated as a heartbeat. This is the approach that this client also implements. In addition, the client sets up a timer (interval = half the heartbeat timeout) which is used to send out periodic heartbeat frames to the broker to prevent exactly the scenario you describe... To help me diagnose the issue, can you add some logging statements in writeHeartbeat (specifically, L64 before the return, L71 before returning and L76 inside the catch block) and let me know if the client fails to write the heartbeat frame? |
So working with @rajmaniar we have found what i think is causing this issue. It looks like reading the spec the rabbitmq server only sends at least one heartbeat every heartbeat interval the client sets:
So since dart_amqp also sets a timer to the same interval it can race the rabbitmq server heartbeat and close the connection just before it reads the heart beat. That is shown in the following wireshark trace: Suggestions on how to fix:
|
The 3rd option (counting heartbeats in flight and disconnecting once a threshold is exceeded) sounds like the more robust approach to me. I will propose a PR to implement this mechanism. |
This commit implements a suggestion from #106 (comment). The client-side heartbeat implementation is modified via the introduction of a new tuning parameter called `maxMissedHeartbeats` (default: 3). If heartbeats are enabled and the client sends `maxMissedHeartbeats` consecutive heartbeat messages at the interval negotiated with the broker without receiving any message (heartbeat or regular traffic) back, the client assumes that the server is not available and will raise a HeartbeatFailedException.
@mchandler-plato Can you please check if #111 rectifies the issues you are having? |
This commit implements a suggestion from #106 (comment). The client-side heartbeat implementation is modified via the introduction of a new tuning parameter called `maxMissedHeartbeats` (default: 3). If heartbeats are enabled and the client sends `maxMissedHeartbeats` consecutive heartbeat messages at the interval negotiated with the broker without receiving any message (heartbeat or regular traffic) back, the client assumes that the server is not available and will raise a HeartbeatFailedException.
@achilleasa We work together with @mchandler-plato. I have a branch in my forked repo where I skip every second sent heartbeat, which helps catch broken heartbeat state quite fast. I just tried your PR with this scenario, and it looks to be fixed - I do not see "broken heartbeats" restarts anymore. So 1st check is good. But the main check will be when we deeply your changes to prod, I'll discuss it with the team soon. Right now, we use in prod our own "2x check timer duration" fix, which also helps for my test scenario with skipped heartbeats. |
btw since in your PR missInterval became heartbeatPeriod x3 with default value it is very close to our current fix with hardcoded heartbeatPeriod x2 =) |
When we have a connection to rabbit with very low consumer traffic or a connection with not consumers (only used for publishing) we get an occasion close with the message
too many missed heartbeats
I'll add more details to this ticket as I investigate this a bit more so it's a placeholder for now.
The text was updated successfully, but these errors were encountered: