Adding ping-pong disconnect to node router #489
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
The node server running on Heroku hasn't been able to detect disconnects in the same way that it does when running locally. This leads to delays on marking tasks as disconnected, expiring tasks, and generally a mismatch between
Agent
andUnit
status on MTurk compared to Mephisto.This PR moves to record ping times on the router to allow it to determine without a hard disconnect whether an agent has disconnected, currently set to not having received a heartbeat in 15 seconds.
Resolves #485.
Implementation details
last_ping
to theLocalAgentState
of the node router, and update it whenever aHEARTBEAT
is received.STATUS_DISCONNECT
when relevant.Additionally, found and resolved the following bugs:
Unit
transitions toEXPIRED
it cannot move back toASSIGNED
.Testing
Launched a server on Heroku with the parlai chat task, connected to it as two different agents, and then disconnected one. Ensured that, while this disconnect wasn't detected before, it is now.
Launched a static react task to ensure connection functionality on this hadn't changed (as no heartbeats are sent). Connected, and found myself still connected after 15 seconds.
Ensured both tasks shut down properly when completed.