-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(robot-server): Fix indefinite protocol cancel state #14428
fix(robot-server): Fix indefinite protocol cancel state #14428
Conversation
Certain EngineStatus states are internal to PE and not handled by the robot-server, including stop-requested. This means that when the app requests a STOP, robot-server doesn't internally manage the transition from stop-requested->stopped. Simply emitting refetch flags whenever the robot-server updates the state is therefore insufficient. The current solution is to do what we do for current_commands, which is to poll PE and emit to the app if there's an update. In the future, we should bubble up this event to robot-server from PE.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## edge #14428 +/- ##
==========================================
- Coverage 68.14% 68.13% -0.01%
==========================================
Files 2522 2522
Lines 72033 72729 +696
Branches 9219 9571 +352
==========================================
+ Hits 49087 49557 +470
- Misses 20756 20895 +139
- Partials 2190 2277 +87
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense, thanks.
Have you given any thought to how we can catch this kind of thing with automated tests? It seems like we need some kind of integration test; unit tests would have been insufficient because the robot-server code was doing the "intended thing," it's just that what was intended was not sufficient for correct end-to-end behavior.
robot-server's current integration tests are doing a bit of polling themselves, for things like "wait until the run has completed." We can write a notification client in Python and replace that polling with subscriptions. That won't give us coverage of every field. But maybe it's the best bang for buck nonetheless?
!([ | ||
RUN_STATUS_STOP_REQUESTED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still confused about how this worked lol
Good question. Totally agree that we need integration testing here - working that out is my next ticket item. Thanks for pointing me towards the existing logic - I might see if I can repurpose it instead of inventing a new wheel entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through all details of the notifications work done so far so I might be missing something, but this lgtm. Just a question about possible duplicate notifications.
if ( | ||
current_state_summary_status is not None | ||
and self._previous_state_summary_status != current_state_summary_status | ||
): | ||
await self._publish_runs_async(run_id=run_id) | ||
self._previous_state_summary_status = current_state_summary_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be certain status transitions (like IDLE -> RUNNING -> PAUSED) that will be handled by the robot server and hence (I'm guessing) will have a notification sent out. How is the publisher differentiating between such a transition vs one that requires polling. Because would it not send out duplicate notifications otherwise? Or does that not matter since it's just a tiny redundancy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question (and sorry to have missed this before the merge) - it's a redundancy here. As you note, it's relatively tiny compared to what the app does now.
The idea is that this polling is pretty temporary - we're planning on implementing the callback strategy we discussed yesterday as a more permanent solution to this problem, and we should revisit any redundancy and clean it up then. I think it's sufficient for now, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the clarification!
Here's one example of where we're doing that, FYI. Grepping for opentrons/robot-server/tests/integration/http_api/runs/test_protocol_run.tavern.yaml Line 133 in d81e1c3
|
Closes RQA-2286
Overview
This PR fixes a bug in which cancelling a protocol from the desktop app/ODD results in an indefinite stop-requested state, which does not resolve unless the user navigates away and back to the protocol from the desktop app. Certain EngineStatus states are internal to PE and not handled by the robot-server, including stop-requested. This means that when the app requests a STOP, robot-server doesn't internally manage the transition from stop-requested->stopped. Simply emitting refetch flags whenever the robot-server updates the state is therefore insufficient. The current solution is to do what we do for current_commands, which is to poll PE and emit to the app if there's an update. In the future, we should bubble up this event to robot-server from PE (the TODO is already there).
There's a cheeky removal of the RUN_STATUS_STOP_REQUESTED as a run status that currently disables refetching on the app side. After spending some time looking at it with @shlokamin, we were unable to determine why this conditional is here, or why things appear to work even when it is here. Let's just get rid of it.
Test Plan
Changelog
Risk assessment
low. This PR just adds another notification push. Worst case scenario, the app refetches more often than it should.