Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with setting connection timeout lower than heartbeat? #366

Open
nbertram opened this issue Sep 27, 2021 · 1 comment
Open

Issue with setting connection timeout lower than heartbeat? #366

nbertram opened this issue Sep 27, 2021 · 1 comment

Comments

@nbertram
Copy link

Hi,

We've seen some weird behaviour with AmazonMQ where suddenly the connection closes, seemingly when there's incoming subscription data, like this:

DEBUG:stomp.py:socket read error
DEBUG:stomp.py:nothing received, raising CCE
INFO:stomp.py:Receiver loop ended

The socket read error is "The read operation timed out".

I can't be 100% certain, but I feel like the transport doesn't expect socket.read() to time out after an idle period. We have timeout set to 10 seconds, but heartbeating at 30, so read() does time out between heartbeats if there's no other traffic, then we get disconnected. For some reason this only seems to happen when connected over TLS, though I can't figure out quite why, except the read() semantics are slightly different.

I think the introduction of socket.settimeout() in #55 might've inadvertently affected the read semantics?

A workaround seems to be setting the heartbeat lower than the timeout to prevent the issue manifesting itself, though in normal operation we'd prefer to have the connect timeout quite short. Should the transport potentially unset the global socket timeout after it's successfully connected?

Thanks

@juhap
Copy link

juhap commented Nov 6, 2021

I noticed this same problem when running on Amazon EKS (Python 3.8.10) and connecting to Amazon ActiveMQ but could not replicate it on my own computer. Heartbeat send/receive was 15s. With timeout 10s I saw this problem. With 60s timeout I did not notice anything.

Not sure what is really happening there. Transport.receive() is raising InterruptedException if the socket.recv(...) returns either EAGAIN or EINTR error. This exception is caught and ignored in the transport.__read(). If this was an issue with socket.recv getting interrupted due to the timeout, I would not expect to see those log messages nbertram posted.

Also the Python documentation for socket.recv says "Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an InterruptedError exception (see PEP 475 for the rationale)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants