Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeats stop after disconnect/reconnect #351

Open
bgardner8008 opened this issue Mar 26, 2021 · 10 comments
Open

Heartbeats stop after disconnect/reconnect #351

bgardner8008 opened this issue Mar 26, 2021 · 10 comments

Comments

@bgardner8008
Copy link

I'm following the example code for handling disconnections: reconnect in the on_disconnect handler. However I noticed that after reconnecting, outgoing heartbeats were not being sent. I tracked this down to a very subtle issue, the name of the listener. Our listener is called 'default'. Internally the stomp.py code uses a listener 'protocol-listener' used for internal bookkeeping. Notifications are sent to listeners in alphabetical order, so the disconnect was first sent to our listener, which reconnected, and after our on_disconnect returned to stomp, the internal disconnect was sent to the HeartbeatListener, which proceeded to terminate the heartbeat loop. The solution was to name our listener 'z-default' to force it to run after the internal listener. It seems that the internal listener should always run first? Alternatively, the importance of the name should be pointed out in the documentation. Thanks.

@achaiah
Copy link

achaiah commented Apr 22, 2021

I see the same problem. Heartbeats stop being published after a reconnect. I have also confirmed that the above solution works but it's definitely a bug in the implementation.

@achaiah
Copy link

achaiah commented Apr 22, 2021

Actually, this only solves the problem if the stomp server does not restart. Upon reboot, even with the 'z-default' name, the server stops receiving heartbeats.

@CorvetteCole
Copy link

I'm still seeing this issue and for me, changing the listener name doesn't fix it at all. Upon reconnect heartbeats do not work with the same connection object. re-initializing the connection is a valid workaround but clunky and shouldn't be needed. Any thoughts on where this might be @jasonrbriggs ? Thanks for your time!

@CorvetteCole
Copy link

in particular, the device with the stomp server running on it is rudely disconnected by unplugging the ethernet cable. doing that repeatedly causes heartbeats to stop working after the 2nd reconnect.

@bgardner8008
Copy link
Author

I haven't looked at this stuff since my post above. If memory serves, I was able to debug by adding a bunch more logging to the transport code and then forcing disconnects by unplugging cables and then studying logs.

@hugobrilhante
Copy link

I spent several hours trying to understand this problem. I noticed that when disconnecting the heartbeat_terminate_event and running attributes remain with the same state defined by the on_disconnect, preventing the heartbeat from being sent.

I made a modification for testing and was able to send the heartbeats after reconnecting.

 # listener.py L.268
 self.heartbeat_terminate_event = threading.Event()
 while True:

@jasonrbriggs what do you think about this approach?

@gconklin
Copy link

I spent several hours trying to understand this problem. I noticed that when disconnecting the heartbeat_terminate_event and running attributes remain with the same state defined by the on_disconnect, preventing the heartbeat from being sent.

I just ran into this on stomp.py 8.0.0. I believe the primary issue is that the new thread/loop is started before the old thread/loop exits. Which means that the event is still set at new loop start.

@tyjcobb94
Copy link

I just ran into this on stomp.py 8.0.0. I believe the primary issue is that the new thread/loop is started before the old thread/loop exits. Which means that the event is still set at new loop start.

I also ran into this race condition on 8.0.0 and had to implement a sleep within the on_disconnected handler for my listener to wait for some time before reconnecting. Otherwise the new heartbeat thread tries to start up before the heartbeat_terminate_event is cleared from the previous disconnect event and no new heartbeats get sent.

@jasonrbriggs is there a way to synchronously shutdown the heartbeat loop in the on_disconnected instead of using the terminate event? That would prevent us from reconnecting before disconnect is complete.

@gconklin
Copy link

My final solution was to just let the thread die (on_disconnected is empty) and monitor it from the creating thread (main in my case) to start a brand new one if it dies.

        while running_event.wait(5) is False:
            if not stomp_conn.is_connected():
                logger.warning("Starting new stomp connection")
                stomp_conn = stomp.Connection(...)
                stomp_conn.set_listener("", MyListener(...))
                stomp_conn.connect_and_subscribe()

It's not an ideal solution, but has been stable for the last couple weeks. I spent a fair amount of time thinking how the code could be reworked to track the Thread instead of the event, but the thread object isn't kept anywhere after it's started so you can't thread.join() it. I toyed with the idea of making my own default_create_thread to track the created thread since you can pass that in, but it was just a workaround and not an actual fix for the race condition and connection status tracking.

@jasonrbriggs
Copy link
Owner

@jasonrbriggs is there a way to synchronously shutdown the heartbeat loop in the on_disconnected instead of using the
terminate event? That would prevent us from reconnecting before disconnect is complete.

Not easily, given the fact that the heartbeat listener is pretty decoupled from the main connection. I'll give it some further thought...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants