Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(app,api): Add opt-in ping/pong monitoring to RPC websocket #2083

Merged
merged 1 commit into from
Aug 20, 2018

Conversation

mcous
Copy link
Contributor

@mcous mcous commented Aug 17, 2018

overview

This PR adds opt-in ping/pong monitoring to the RPC websocket. It also improves/adds WebSocket closure handling to surface socket closure to the UI more reliably.

Closes #2052

changelog

  • feat(app,api): Add opt-in ping/pong monitoring to RPC websocket

review requests

Checking on the currently connected robot via polling GET /health has proven unreliable (socket can fail even when the health endpoint is fine) and is kind of a mess in terms of app state. Also, now that the app can combine multiple IP addresses into one robot (if $OT_APP_DISCOVERY__ENABLED is set), the existing health-check implementation doesn't know what to do if the currently connected IP goes down but the robot is still accessible via another IP. For the purposes of switching to new discovery by default, this is blocking.

Rather than getting crazy with the existing health-check stuff, this PR implements a very simple (and battle-tested) ping/pong routine where the client sends a ping and the server responds with a pong on an interval. If too many pings go by without a pong, the client will close the socket and signal the UI.

If the RPC server the app connects to supports ping/pong, the app will not spin up a GET /health poll routine for that robot. We can build auto-reconnect attempts on top of this, and eventually remove the http-api-client/health-check module entirely in favor of the /health polling in discovery-client and the socket ping/pong here.

I didn't even attempt to build unit tests for this because the server-side RPC tests are completely unworkable (it seems like raises in the code cause indefinite hangs rather than test failures) and the client-side code isn't much better. I'm sorry.

test plan

Tests can be run with real robot or virtual smoothie, provided at least a few people test on real robots.

regression testing

On a robot without ping/pong RPC:

  • App starts GET /health poll on connect
  • App stops GET /health poll on disconnect
  • Websocket close triggers LostConnectionAlert
    • Simulate this by killing the server process
  • Health poll failures trigger LostConnectionAlert
    • I'm not sure, but I think you can simluate this by disabling your WiFi / pulling the ethernet cable

new feature testing

On a robot with ping/pong RPC (sunset has been loaded with this branch):

  • App does not start GET /health poll on connect
  • Ping/pong routine works through / does not interfere with protocol run
  • Websocket close triggers LostConnectionAlert
    • Simulate this by killing the server process
  • Missed pongs trigger LostConnectionAlert
    • Simulate this by killing your laptop's WiFi and/or:
    • Editing lines 312-317 in api/opentrons/server/rpc.py
      • This will cause the server to never send a pong
      • Client should close out the socket itself after ~9 seconds (3 pings * 3 seconds per ping)
      # before
      def send_pong(self):
          self.send({
              '$': {
                  'type': PONG_MESSAGE
              }
          })
      
      # after
      def send_pong(self):
          pass

@mcous mcous added feature Ticket is a feature request / PR introduces a feature app Affects the `app` project api Affects the `api` project ready for review labels Aug 17, 2018
@codecov
Copy link

codecov bot commented Aug 17, 2018

Codecov Report

Merging #2083 into edge will increase coverage by 0.07%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##             edge    #2083      +/-   ##
==========================================
+ Coverage   33.03%   33.11%   +0.07%     
==========================================
  Files         452      452              
  Lines        7196     7224      +28     
==========================================
+ Hits         2377     2392      +15     
- Misses       4819     4832      +13
Impacted Files Coverage Δ
app/src/robot/actions.js 92.3% <0%> (-1.2%) ⬇️
app/src/robot/reducer/connection.js 90.9% <0%> (-4.33%) ⬇️
app/src/components/LostConnectionAlert/index.js 0% <0%> (ø) ⬆️
app/src/health-check/index.js 98.36% <100%> (ø) ⬆️
app/src/robot/selectors.js 82.38% <100%> (+0.22%) ⬆️
app/src/rpc/message-types.js 100% <100%> (ø) ⬆️
app/src/rpc/client.js 89.18% <50%> (-5.47%) ⬇️
app/src/robot/api-client/client.js 86.5% <87.5%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd9502a...02c1037. Read the comment docs.

Copy link
Contributor

@Kadee80 Kadee80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested as best I could on wandering-👩 and VS. Could only confirm via calibration screens not run as wandering-👩can not simulate a run properly.

@mcous
Copy link
Contributor Author

mcous commented Aug 20, 2018

In my testing on Sunset, the ping/pong routine seems to work just fine during a protocol run. Additionally and most importantly, if a robot exists as both an ethernet and a wifi discovered robot, you connect to RPC over ethernet, and then you yank the cable, ping/pong will notify you that the socket has gone down and prompt you to reconnect.

There's still some additional polish to be done around this experience (e.g. what do we do if connected to a robot over wifi and ethernet becomes available?), but could use UX guidance and additional ticket(s)

@mcous mcous requested a review from Laura-Danielle August 20, 2018 16:02
Copy link
Contributor

@b-cooper b-cooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐙 Looking forward to sunsetting GET /health poll!

Copy link
Contributor

@btmorr btmorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏋️ BE code looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Affects the `api` project app Affects the `app` project feature Ticket is a feature request / PR introduces a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC: WebSocket connection should be monitored with a ping/pong
4 participants