Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential heartbeat misconfiguration #493

Closed
boncea opened this issue Jan 27, 2021 · 3 comments · Fixed by rabbitmq/amqp091-go#5
Closed

Potential heartbeat misconfiguration #493

boncea opened this issue Jan 27, 2021 · 3 comments · Fixed by rabbitmq/amqp091-go#5

Comments

@boncea
Copy link

boncea commented Jan 27, 2021

Problem (it occurs randomly):

GO library error: Exception (504) Reason: "channel/connection is not open"
RabbitMQ related log: missed heartbeats from client, timeout: 10s

Important to refresh - rabbitMQ doc:

Heartbeat frames are sent about every heartbeat timeout / 2 seconds. This value is sometimes referred to as the heartbeat interval. After two missed heartbeats, the peer is considered to be unreachable.
[...]
It is important to not confuse the timeout value with the interval one. RabbitMQ configuration exposes the timeout value, so do the officially supported client libraries. However some clients might expose the interval, potentially causing confusion.

source - rabbitMQ doc

Investigation details

According to the doc and to the rabbitMQ log message, the heartbeat should be sent every 5 seconds. Still, based on my investigation this is not the case. My guess is that the heartbeat is sent only once per timeout interval and in the case of race conditions (the heartbeat client time aligns well with the rabbitMQ server heartbeat timeout time), the server will close the channel. Please check my reasoning below:

Suggestions:

Considering that I didn't spend time to test my theory, my assumptions might be wrong, so please correct me in this case.

@Neal
Copy link

Neal commented Mar 21, 2021

@boncea thanks for the detailed report -- wish I had searched here before debugging on my own but I came here for exactly this issue as I'm seeing similar behavior as well and came up with the same conclusion after some investigation.

@boncea
Copy link
Author

boncea commented Mar 21, 2021

@Neal glad that at least there is one more person confirming my theory. I could do the fix by myself as we just have to use half of the heartbeat value to send signals, but considering that there was no response in two months, I'm wondering if this library is still maintained.

@Neal
Copy link

Neal commented Mar 21, 2021

considering that there was no response in two months, I'm wondering if this library is still maintained.

This recent #497 (comment) suggests that unfortunately, it is not. However, this #497 (comment) suggests that Team RabbitMQ plans to take over soon. So hopefully we start getting these fixes in soon (including other open PRs).

This particular issue, though, I am only seeing on one of our environments (which is configured the same way as far as I can tell except that it's a single node cluster vs. the other ones being 3 nodes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants