feat(api): Pause when `pick_up_tip()` errors in a Python protocol #14753

SyntaxColoring · 2024-03-28T20:24:24Z

Overview

Test Plan

Setup

Enable the hidden error recovery feature flag by manually issuing a POST /settings request with {"id": "enableErrorRecoveryExperiments", "value": true}.

Test

Upload and run this protocol.

from opentrons import protocol_api

requirements = {"robotType": "Flex", "apiLevel": "2.17"}


def run(protocol: protocol_api.ProtocolContext) -> None:
    protocol.load_trash_bin("A3")

    tip_rack = protocol.load_labware("opentrons_flex_96_tiprack_50ul", "C2")
    reservoir = protocol.load_labware(
        "opentrons_96_wellplate_200ul_pcr_full_skirt", "C3"
    )
    pipette = protocol.load_instrument(
        "flex_1channel_50", mount="left", tip_racks=[tip_rack]
    )

    for source, dest in zip(reservoir.columns()[0], reservoir.columns()[1]):
        pipette.pick_up_tip()
        pipette.move_to(source.top())
        pipette.move_to(dest.top())
        pipette.drop_tip()

When it hits a missing spot in the tip rack, it should pause. From there, you should be able to issue a stop or resume-from-recovery action to POST /runs/{id}/actions.
If you do resume-from-recovery, the pick_up_tip() calls later on in the protocol should continue from the next wells in the tip rack. It should not repeatedly try to pick up tips from the same well.

Cleanup

POST /settings with {"id": "enableErrorRecoveryExperiments", "value": false}.

Changelog

We currently have some machinery to send a command to ProtocolEngine and block until the command finishes executing. The main part of this PR is adding variants of that to also wait for the error recovery to finish. We reimplement ProtocolContext.pick_up_tip(), specifically, with that new variant. Other commands will happen later.
The tip handler in ProtocolEngine is modified so that when a pickUpTip fails recoverably, it marks the requested tips as used. This is one way of getting the next PAPI pick_up_tip() to automatically move on to the next tip. See discussion below.

Review requests

See my review comments.

Risk assessment

Low-risk when the enableErrorRecoveryExperiments feature flag is off. Medium-risk if it's on, since this starts venturing further into territory where Protocol Engine's understanding of the protocol history and state is different from PAPI's understanding of the protocol history and state.

ParamSpec, my beloved.

codecov · 2024-03-28T20:30:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.24%. Comparing base (65885b2) to head (6b79625).
Report is 24 commits behind head on edge.

❗ Current head 6b79625 differs from pull request most recent head cf1e936. Consider uploading reports for the commit cf1e936 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             edge   #14753      +/-   ##
==========================================
- Coverage   67.24%   67.24%   -0.01%     
==========================================
  Files        2495     2495              
  Lines       71254    71253       -1     
  Branches     8937     8937              
==========================================
- Hits        47918    47917       -1     
  Misses      21235    21235              
  Partials     2101     2101

Flag	Coverage Δ
g-code-testing	`92.43% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...i/src/opentrons/protocol_engine/actions/actions.py	`100.00% <ø> (ø)`
...c/opentrons/protocol_engine/clients/sync_client.py	`100.00% <ø> (ø)`
...rc/opentrons/protocol_engine/clients/transports.py	`100.00% <ø> (ø)`
...rons/protocol_engine/execution/command_executor.py	`100.00% <ø> (ø)`
...pentrons/protocol_engine/execution/queue_worker.py	`100.00% <ø> (ø)`
...i/src/opentrons/protocol_engine/protocol_engine.py	`100.00% <ø> (ø)`
...pi/src/opentrons/protocol_engine/state/commands.py	`99.18% <ø> (ø)`
api/src/opentrons/protocol_engine/state/state.py	`100.00% <ø> (ø)`
...opentrons/protocol_runner/legacy_command_mapper.py	`98.21% <ø> (ø)`

... and 4 files with indirect coverage changes

DerekMaggio · 2024-03-29T14:30:04Z

@SyntaxColoring, #14748 contains some error recovery protocols if you need any test cases.

SyntaxColoring · 2024-04-01T14:25:56Z

api/src/opentrons/protocol_engine/protocol_engine.py

+    async def add_and_execute_command_wait_for_recovery(
+        self, request: commands.CommandCreate
+    ) -> None:


I'm open to ideas for better names for this.

Or maybe we should make it a wait_for_recovery: bool argument on the existing add_and_execute() method.

api/src/opentrons/protocol_engine/state/commands.py

api/src/opentrons/protocol_api/core/engine/instrument.py

sfoster1 · 2024-04-01T17:19:49Z

api/src/opentrons/protocol_api/core/engine/instrument.py


+        # FIX BEFORE MERGE: We should probably only set_last_location() if the


I disagree - I think there are three answers each of which is honestly reasonable, but one which is the best (and also hardest). As a preamble, we should probably push last-location down to the engine.

the best and hardest is, base the decision of setting last location on when the error occurred (during which action) and what the error is. For instance, if you got a stall while moving to the tiprack, we should clear the last location since position is now indeterminate; but if we failed our tip-state consistency check after the pickup, we should set the last location to the rack.

the simplest and most robust is, always clear the last location on any error

and a slightly more optimistic version is to always set it on any error since the most likely causes of failure will in fact leave the gantry in the specified location

After looking at this closer, I think it's appropriate to call this set_last_location() unconditionally. See this new comment:

opentrons/api/src/opentrons/protocol_api/core/engine/instrument.py

Lines 412 to 423 in 6b79625

self._engine_client.pick_up_tip_wait_for_recovery(

pipette_id=self._pipette_id,

labware_id=labware_id,

well_name=well_name,

well_location=well_location,

)

# Set the "last location" unconditionally, even if the command failed

# and was recovered from and we don't know if the pipette is physically here.

# This isn't used for path planning, but rather for implicit destination

# selection like in `pipette.aspirate(location=None)`.

self._protocol_core.set_last_location(location=location, mount=self.get_mount())

This reverts commit b4d4d80.

SyntaxColoring · 2024-04-02T17:44:07Z

api/src/opentrons/protocol_engine/protocol_engine.py

+                # FIXME(mm, 2024-04-02): As of https://github.com/Opentrons/opentrons/pull/14726,
+                # this action is only legal if the command is running, not queued.


This is my fault; in #14726, I neglected to account for what this estop() method was doing. I'm going to have to fix this in another PR. EXEC-382

I think a lot of this estop() method is due for a rethink. Like, I don't really get why it needs to be so different from stop(), and why it needs to be messing with things like FailCommandActions itself.

api/src/opentrons/protocol_engine/clients/transports.py

sfoster1

This looks good overall, but a couple points:

I'm a bit concerned about the "last command to fail" logic in the command store. Can we instead save the command id when it fails recoverably and we enter AWAITING_RECOVERY and make that state more explicit?
Something feels slightly off about the checks in ProtocolEngine.add_and_execute_command_wait_for_recovery since they check two different pieces of state in tight sequence

sfoster1 · 2024-04-08T16:32:42Z

api/src/opentrons/protocol_engine/clients/transports.py

+            #    which remains `queued` because of the pause.
+            # 3. The engine is stopped. The returned command will be `queued`
+            #    and won't have a result.
+            raise ProtocolCommandFailedError(


let's make this its own unique exception type and code so that we can programmatically differentiate it from other errors

Done in 97b0ada.

api/src/opentrons/protocol_engine/protocol_engine.py

sfoster1 · 2024-04-08T16:37:40Z

api/src/opentrons/protocol_engine/state/commands.py

@@ -708,6 +708,15 @@ def get_all_commands_final(self) -> bool:

        return no_command_running and no_command_to_execute

+    def get_recovery_in_progress_for_command(self, command_id: str) -> bool:
+        """Return whether the given command failed and its error recovery is in progress."""
+        command_is_most_recent_to_fail = (


this is a little gross to me, could we save the command we're currently recovering from explicitly? like, what happens if a fixit command fails, wouldn't it instantly make this False?

Changed in a880096.

api/src/opentrons/protocol_engine/state/tips.py

sfoster1 · 2024-04-08T16:46:12Z

api/src/opentrons/protocol_engine/protocol_engine.py

+        queued_command = self.add_command(request)
+        await self.wait_for_command(command_id=queued_command.id)
+        completed_command = self._state_store.commands.get(queued_command.id)
+        await self._state_store.wait_for_not(


it kind of feels like we want to gate this on a synchronous call to whether we're now in recovery for the command, but this in general feels a little race condition-y because of the separation between command state and run state. Specifically,

wait_for_command returns on the asyncio spin after a FailCommandAction or a SucceedCommandAction for this command (we can neglect the queued-and-stopping part for now)

but the get_recovery_in_progress_for_command predicate is based on the queue status being awaiting-recovery

Do we guarantee mechanically that the queue status will be set before the asyncio spin after the command terminal action is dispatched? Are we sure this won't occasionally race and return early?

Yes, if I understand your concern correctly:

When await self.wait_for_command(command_id=queued_command.id) returns, we are guaranteed that the action that finalized the command has already been fully processed, and that get_recovery_in_progress_for_command() will see its results on the state.

When we handle an action, we send it to each of the substores in a loop. Only after that's done do we notify subscribers like this one.

TamarZanzouri · 2024-04-08T16:59:37Z

api/src/opentrons/protocol_engine/execution/command_executor.py

@@ -167,6 +167,7 @@ async def execute(self, command_id: str) -> None:
                FailCommandAction(
                    error=error,
                    command_id=running_command.id,
+                    running_command=running_command,


I guess I am wondering why we need this in the action? dont we have the failed command stored in PE already?

It's a hack, we probably don't need it. See this comment:

opentrons/api/src/opentrons/protocol_engine/actions/actions.py

Lines 177 to 180 in bd14851

# This is a quick hack so FailCommandAction handlers can get the params of the

# command that failed. We probably want this to be a new "failure details"

# object instead, similar to how succeeded commands can send a "private result"

# to Protocol Engine internals.

TamarZanzouri

overall looks good, added a question and a bit confused about the wait_for_not although I get what its doing but the logic around it is a bit confusing to me. besides that looks great!

SyntaxColoring · 2024-04-08T20:29:18Z

api/src/opentrons/protocol_engine/state/commands.py

+    recovery_target_command_id: Optional[str]
+    """If we're currently recovering from a command failure, which command it was."""
+


Adding this as an orthogonal attribute is quick and dirty. Ideally, this ID would not exist at all when QueueStatus is anything other than AWAITING_RECOVERY. We could do that by retructuring this state so it's closer to a union of dataclasses instead of a dataclass of unions.

sfoster1

Looks great, thank you for the changes!

…4753)

SyntaxColoring added 2 commits March 28, 2024 16:25

Type-safe wait_for().

42a36ec

ParamSpec, my beloved.

Internal support for waiting for a specific command's recovery.

33225f7

SyntaxColoring force-pushed the papi_pause_on_error branch from 34a759d to 33225f7 Compare March 28, 2024 20:25

SyntaxColoring added 2 commits March 29, 2024 14:59

WIP

98446ba

Another todo comment.

459f14d

SyntaxColoring commented Apr 1, 2024

View reviewed changes

SyntaxColoring changed the title ~~feat(api): Pause-on-error for Python protocols~~ feat(api): Pause when pick_up_tip() errors in a Python protocol Apr 1, 2024

SyntaxColoring mentioned this pull request Apr 1, 2024

feat(api): Do not enqueue json commands on protocol load #14759

Merged

Add SetTipUsedAction.

b4d4d80

sfoster1 reviewed Apr 1, 2024

View reviewed changes

SyntaxColoring added 4 commits April 1, 2024 14:52

Revert "Add SetTipUsedAction."

b61913f

This reverts commit b4d4d80.

Consume tips in failed pickUpTip commands.

9322657

Document some background for the estop() method.

33d33c9

Update estop().

dbb03db

SyntaxColoring commented Apr 2, 2024

View reviewed changes

SyntaxColoring added 2 commits April 3, 2024 12:07

Enflippen der waitenforen.

b13db26

Add unit test for get_recovery_in_progress_for_command().

838572b

SyntaxColoring mentioned this pull request Apr 3, 2024

refactor(protocol-engine): Keep track of failed commands' error recovery types #14795

Merged

SyntaxColoring added 4 commits April 4, 2024 12:20

Merge branch 'edge' into papi_pause_on_error

1d5ba4b

Fix merge mistake.

6353d7d

Fix run-stop handling.

1ef9bc9

Update various unit tests for new action field.

cb5bca1

SyntaxColoring force-pushed the papi_pause_on_error branch from 3350202 to 895750f Compare April 5, 2024 20:01

SyntaxColoring added 2 commits April 5, 2024 16:02

Update and port command store tests.

51bd1d2

If you write a comment describing your sins, that makes the sins okay.

6b79625

SyntaxColoring force-pushed the papi_pause_on_error branch from 895750f to 6b79625 Compare April 5, 2024 20:02

SyntaxColoring added 2 commits April 8, 2024 11:20

transports.py linting, docs, and error simplification.

0213657

Small fixups.

ff22758

Refactor _wait_for().

8ad0eef

SyntaxColoring marked this pull request as ready for review April 8, 2024 15:57

SyntaxColoring requested a review from a team as a code owner April 8, 2024 15:57

Merge branch 'edge' into papi_pause_on_error

bd14851

TamarZanzouri reviewed Apr 8, 2024

View reviewed changes

api/src/opentrons/protocol_engine/clients/transports.py Show resolved Hide resolved

sfoster1 requested changes Apr 8, 2024

View reviewed changes

TamarZanzouri reviewed Apr 8, 2024

View reviewed changes

Explicitly keep track of the current recovery target.

a880096

SyntaxColoring commented Apr 8, 2024

View reviewed changes

SyntaxColoring added 3 commits April 8, 2024 16:33

Update todo comment to be more specific.

50e1706

Raise a distinct error type.

97b0ada

Replace todo with note.

cf1e936

SyntaxColoring requested a review from sfoster1 April 9, 2024 15:26

sfoster1 approved these changes Apr 9, 2024

View reviewed changes

SyntaxColoring merged commit 1819b8c into edge Apr 9, 2024
20 checks passed

SyntaxColoring deleted the papi_pause_on_error branch April 9, 2024 15:58

Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024

feat(api): Pause when pick_up_tip() errors in a Python protocol (#1…

6f66899

…4753)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): Pause when `pick_up_tip()` errors in a Python protocol #14753

feat(api): Pause when `pick_up_tip()` errors in a Python protocol #14753

SyntaxColoring commented Mar 28, 2024 •

edited

Loading

codecov bot commented Mar 28, 2024 •

edited

Loading

DerekMaggio commented Mar 29, 2024

SyntaxColoring Apr 1, 2024 •

edited

Loading

sfoster1 Apr 1, 2024

SyntaxColoring Apr 8, 2024 •

edited

Loading

SyntaxColoring Apr 2, 2024 •

edited

Loading

SyntaxColoring Apr 2, 2024 •

edited

Loading

sfoster1 left a comment

sfoster1 Apr 8, 2024

SyntaxColoring Apr 8, 2024 •

edited

Loading

sfoster1 Apr 8, 2024

SyntaxColoring Apr 8, 2024

sfoster1 Apr 8, 2024

SyntaxColoring Apr 8, 2024

TamarZanzouri Apr 8, 2024

SyntaxColoring Apr 8, 2024

TamarZanzouri left a comment

SyntaxColoring Apr 8, 2024

sfoster1 left a comment


		# FIX BEFORE MERGE: We should probably only set_last_location() if the

	self._engine_client.pick_up_tip_wait_for_recovery(
	pipette_id=self._pipette_id,
	labware_id=labware_id,
	well_name=well_name,
	well_location=well_location,
	)

	# Set the "last location" unconditionally, even if the command failed
	# and was recovered from and we don't know if the pipette is physically here.
	# This isn't used for path planning, but rather for implicit destination
	# selection like in `pipette.aspirate(location=None)`.
	self._protocol_core.set_last_location(location=location, mount=self.get_mount())

		# FIXME(mm, 2024-04-02): As of https://github.com/Opentrons/opentrons/pull/14726,
		# this action is only legal if the command is running, not queued.

	# This is a quick hack so FailCommandAction handlers can get the params of the
	# command that failed. We probably want this to be a new "failure details"
	# object instead, similar to how succeeded commands can send a "private result"
	# to Protocol Engine internals.

		recovery_target_command_id: Optional[str]
		"""If we're currently recovering from a command failure, which command it was."""

feat(api): Pause when pick_up_tip() errors in a Python protocol #14753

feat(api): Pause when pick_up_tip() errors in a Python protocol #14753

Conversation

SyntaxColoring commented Mar 28, 2024 • edited Loading

Overview

Test Plan

Setup

Test

Cleanup

Changelog

Review requests

Risk assessment

codecov bot commented Mar 28, 2024 • edited Loading

Codecov Report

DerekMaggio commented Mar 29, 2024

SyntaxColoring Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

sfoster1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TamarZanzouri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfoster1 left a comment

Choose a reason for hiding this comment

feat(api): Pause when `pick_up_tip()` errors in a Python protocol #14753

feat(api): Pause when `pick_up_tip()` errors in a Python protocol #14753

SyntaxColoring commented Mar 28, 2024 •

edited

Loading

codecov bot commented Mar 28, 2024 •

edited

Loading

SyntaxColoring Apr 1, 2024 •

edited

Loading

SyntaxColoring Apr 8, 2024 •

edited

Loading

SyntaxColoring Apr 2, 2024 •

edited

Loading

SyntaxColoring Apr 2, 2024 •

edited

Loading

SyntaxColoring Apr 8, 2024 •

edited

Loading