Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(engine): pause hardware API when engine is paused #10882

Merged
merged 4 commits into from
Jun 30, 2022

Conversation

mcous
Copy link
Contributor

@mcous mcous commented Jun 23, 2022

Overview

This PR adds calls to hardware_api.pause and hardware_api.resume when engine.pause and engine.resume are called, respectively.

Closes #7923

Changelog

  • fix(engine): pause hardware API when engine is paused

Review requests

This PR needs to be tested on actual hardware rather than the simluator or emulator.

We need to check these acceptance criteria with both PAPIv2 protocols and JSONv6 protocols, because some functionality that was scoped specifically to PAPIv2 protocols has now moved into the ProtocolEngine proper.

  • When a protocol is paused, it can stop in the middle of a command
    • The command should remain in a running state
    • The run should shift to a paused state
    • Writing a protocol with big moves can make this easier to test and observe
  • When the protocol is resumed, the command picks up where it left off

Pause when door is opened

This ticket shouldn't affect "pause on door stop" behavior for PAPIv2 protocols. In advance of #9125, we should check that this PR does not negatively affect our expected PAPIv2 behavior. These tests are not expected to work on JSONv6 protocols until 9125 is completed.

  • Before the run is started
    • Door state does not affect the ability to run LPC (intent: setup) commands
  • Starting the run
    • Starting the run (issuing a play action) starts the run in a paused state
    • Trying to resume the run (issuing another play action) will reject if the door is still open
    • If the door is closed and then another play action is issued, the run starts
  • During the run
    • If the door is opened, the run becomes paused and the robot stops moving
    • Trying to resume the run (issuing another play action) will reject if the door is still open
    • If the door is closed and then another play action is issued, the run resumes

Risk assessment

I started today pretty nervous about this, but once I got past the surface and looked at the ProtocolEngine state behaviors we've already implemented around door state, I'm feeling pretty confident about this!

@mcous mcous added the robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). label Jun 23, 2022
@mcous mcous requested a review from a team as a code owner June 23, 2022 19:16
@mcous mcous requested a review from SyntaxColoring June 23, 2022 19:17
@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #10882 (14e295f) into edge (02c7c79) will not change coverage.
The diff coverage is 97.77%.

Impacted file tree graph

@@           Coverage Diff           @@
##             edge   #10882   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files        2076     2076           
  Lines       57321    57321           
  Branches     5724     5724           
=======================================
  Hits        42266    42266           
  Misses      13820    13820           
  Partials     1235     1235           
Flag Coverage Δ
notify-server 89.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/src/opentrons/hardware_control/types.py 96.51% <ø> (ø)
api/src/opentrons/protocol_engine/__init__.py 100.00% <ø> (ø)
...c/opentrons/protocol_engine/execution/equipment.py 100.00% <ø> (ø)
...ice/session/session_types/live_protocol/session.py 92.00% <85.71%> (ø)
api/src/opentrons/hardware_control/api.py 82.32% <100.00%> (ø)
api/src/opentrons/hardware_control/ot3api.py 80.66% <100.00%> (ø)
...pi/src/opentrons/hardware_control/pause_manager.py 96.55% <100.00%> (ø)
...pentrons/protocol_engine/create_protocol_engine.py 100.00% <100.00%> (ø)
...opentrons/protocol_engine/execution/run_control.py 100.00% <100.00%> (ø)
...i/src/opentrons/protocol_engine/protocol_engine.py 100.00% <100.00%> (ø)
... and 21 more

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up not having a chance to test this on hardware today—sorry!

I've been staring at this for a little bit, but I'm having a pretty hard time following how our play/pause door-opened/door-closed actions flow through all the different layers—runner, engine, plugin, and hardware.

One thing on my mind is this behavior you described in the comment about a run that's played for the first time immediately becoming blocked by the door before any movement happens. It's not clear to me how we're implementing that.

Actually, it sort of looks to me like the legacy plugin is competing with ProtocolEngine.play()?

  1. ProtocolEngine.play() will do self._action_dispatcher.dispatch(), which will give the legacy plugin a chance to run first (I think?)
  2. The plugin will see that the door is blocking, and pause the hardware API
  3. ProtocolEngine.play() will start the queue worker
  4. ProtocolEngine.play() will cause self._hardware_api.resume(), undoing what the legacy plugin did?

@mcous mcous marked this pull request as draft June 27, 2022 16:40
@mcous
Copy link
Contributor Author

mcous commented Jun 27, 2022

I think @SyntaxColoring is right, this PR leaves the LegacyContextPlugin fighting with the ProtocolEngine for HW pause/resume when the door is opening. Marking as a draft for the time being

@mcous
Copy link
Contributor Author

mcous commented Jun 28, 2022

I tested 75cea85 on an OT-2 and found the door switch stuff to still be working with PAPIv2. I'm going to mark this as ready for review so pausing the hardware API in the engine natively can happen with #9125

@mcous mcous marked this pull request as ready for review June 28, 2022 13:53
@@ -12,32 +10,18 @@ class PauseManager:
timer runs out) and the pause resume (trigged by user via the app).
"""

def __init__(self, door_state: DoorState) -> None:
def __init__(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked in with @sfoster1 on this one: the HW API would rather simply report door open and close events. This lines up nicely with a desire on our end to stop deeply nesting access to the config singleton.

Given that this PR gives the ProtocolEngine enough information to reject an invalid resume while the door is open, I removed the unnecessary door logic from PauseManager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion isn't a super informed one, but I'm a big fan of this. It seems way easier to reason about the HW API if we're always controlling it and it's never controlling itself independently.

@@ -291,7 +291,7 @@ class ModuleView(HasState[ModuleState]):

_state: ModuleState

def __init__(self, state: ModuleState, virtualize_modules: bool) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument was completely unused

@mcous mcous requested a review from a team June 28, 2022 14:02
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code makes sense to me with the exception of a removal to queue_worker.start(). I'll also test this on a robot and report results.

@@ -12,32 +10,18 @@ class PauseManager:
timer runs out) and the pause resume (trigged by user via the app).
"""

def __init__(self, door_state: DoorState) -> None:
def __init__(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion isn't a super informed one, but I'm a big fan of this. It seems way easier to reason about the HW API if we're always controlling it and it's never controlling itself independently.

@@ -107,14 +109,19 @@ def play(self) -> None:
PlayAction(requested_at=requested_at)
)
self._action_dispatcher.dispatch(action)
self._queue_worker.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional removal of self._queue_worker.start()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring @mcous I saw it the tests that instead we are now calling hardware_api.resume(HardwarePauseType.PAUSE)
hardware_api.pause(HardwarePauseType.PAUSE)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional; queue_worker.start is called in __init__ so all subsequent calls will no-op

@@ -29,7 +34,9 @@ async def create(
return LiveProtocolSession(
configuration=configuration,
instance_meta=instance_meta,
protocol_engine=await create_protocol_engine(configuration.hardware),
protocol_engine=await create_protocol_engine(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcous you prob notices codecov

@TamarZanzouri TamarZanzouri self-requested a review June 29, 2022 14:08
Copy link
Contributor

@TamarZanzouri TamarZanzouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me and I love the amount of code we got to remove! would love to see what race condition you guys were talking about before you made these changes

Copy link
Member

@sanni-t sanni-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in a call w/ @mcous and the changes look great!

@SyntaxColoring
Copy link
Contributor

SyntaxColoring commented Jun 29, 2022

Tested on a robot with this protocol.

metadata = {"apiLevel": "2.12"}

def run(protocol):
    pipette = protocol.load_instrument("p300_multi_gen2", mount="left")
    lw1 = protocol.load_labware("opentrons_96_tiprack_300ul", 1)
    lw2 = protocol.load_labware("opentrons_96_tiprack_300ul", 3)
    for i in range(10):
        protocol.pause()
        for i in range(2):
            pipette.move_to(lw1["A1"].top())
            pipette.move_to(lw2["A12"].top())
  • The run can still be paused and played, and appears to still handle the door correctly.
  • The run state appears to track running/paused/blocked-by-door-open correctly.
  • When the run is on a protocol.pause() and is waiting for a resume, the commands list (and run log) will show the command following the pause command as current. I'm not sure why.
        {
            "id": "command.PAUSE-1",
            "key": "command.PAUSE-1",
            "commandType": "waitForResume",
            "createdAt": "2022-06-29T16:18:20.701854+00:00",
            "startedAt": "2022-06-29T16:18:20.701854+00:00",
            "completedAt": "2022-06-29T16:18:20.703251+00:00",
            "status": "succeeded",
            "params": {}
        },
        {
            "id": "command.MOVE_TO-4",
            "key": "command.MOVE_TO-4",
            "commandType": "custom",
            "createdAt": "2022-06-29T16:18:20.711125+00:00",
            "startedAt": "2022-06-29T16:18:20.711125+00:00",
            "status": "running",
            "params": {
                "legacyCommandType": "command.MOVE_TO",
                "legacyCommandText": "Moving to A1 of Opentrons 96 Tip Rack 300 µL on 1"
            }
        }

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from an embedded point of view - glad to just be providing events and letting someone else worry about it.

@mcous
Copy link
Contributor Author

mcous commented Jun 30, 2022

When the run is on a protocol.pause() and is waiting for a resume, the commands list (and run log) will show the command following the pause command as current. I'm not sure why.

Expected behavior with PAPIv2, given how pauses work. The ctx.pause sets the HW pause flag and returns, allowing the protocol to enter the next ctx method (dispatching the message broker event) before the protocol hits a HW call that pauses motion.

This behavior is not the case with JSONv6 protocols, where waitForResume will hold in a running state until the protocol is resumed

@mcous mcous merged commit 78658f6 into edge Jun 30, 2022
@mcous mcous deleted the pe_hardware-pause-resume branch June 30, 2022 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(engine): pause HardwareAPI when engine is paused
5 participants