feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

SyntaxColoring · 2024-10-21T20:04:11Z

Overview

This does the interesting part of EXEC-676. See there for background.

Changelog

When a command encounters a defined error, it now enacts two state updates:

The first one is what we had before. It represents all the changes that the command made—up to, and including, the point of failure. This is applied as soon as the command fails, as part of the FailCommandAction.
The second one is new. It represents additional changes to make if this error turns out to be a false positive. This is not applied as part of the FailCommandAction. It's merely stored, and applied later, when (or if) a client tells us that the error was a false positive. I think of it as a deferred state update.

For example, if a pickUpTip fails with atipPhysicallyMissing error:

A tip is logically consumed from the tip rack, but no tip is logically placed on the pipette. (These are the documented semantics of the tipPhysicallyMissing error.) This happens via state_update.
But if a client selects "Ignore error and skip to next step," a tip is then logically placed on the pipette. You can then continue on to the rest of the protocol, and it should just work.

We unfortunately need to go out of our way to keep the hardware API's state in sync with Protocol Engine's state as we do this.

(All of the above is EXEC-785.)

We can use this to allow continuing fromtipPhysicallyMissing and tipPhysicallyAttached errors in pickUpTip, dropTip, and dropTipInPlace commands. (EXEC-778, EXEC-779.)

Test plan

Most easily tested with #16601.

Play around with inducing and recovering from errors. In particular, try continuing from failed tip pickups and tip drops.
Pay attention to path planning. For example, if you ignore and continue from a failed tip pickup, the pipette should move as if it does have a tip attached.
Make sure the "ignore all errors of this type" button works as advertised.
Make sure the older error recovery options ("retry", "cancel") still work.

Review requests

In general, any thoughts on the fundamental approach?
In general, any thoughts on naming?
See comments below for specific stuff.

Risk assessment

Medium or maybe high. The blast radius is confined to error recovery mode, but this does do weird stuff in bad ways.

Implementing this as Protocol Engine actions instead of Protocol Engine commands bypasses some of our usual safeties. We discussed this a little in #16564 (comment).

…_tip().

Missing tests.

sfoster1

I think the hardware synchronizer is the only real way to do it and therefore is fine. In the long term I think we should pull more state out of the hardware controller; it's stuck around in the engine era for the sake of supporting older APIs and direct hardware controller usage, but it is probably time to split that stuff out into a compat layer.

I think a nice improvement would probably be to centralize not only the synchronizing code but also where it gets called, by having a HardwareState or something that eats the resume from recovery action instead of it being a side effect of the top level engine call. Those are always awful to deal with.

Rename the existing internal ErrorRecoveryType value IGNORE_AND_CONTINUE to CONTINUE_WITH_ERROR to try to make room for the new value. (Leave the public HTTP API values alone.)

TamarZanzouri · 2024-10-23T16:13:49Z

api/src/opentrons/protocol_engine/commands/drop_tip.py

@@ -146,7 +146,15 @@ async def execute(self, params: DropTipParams) -> _ExecuteReturn:
                    )
                ],
            )
-            return DefinedErrorData(public=error, state_update=state_update)
+            state_update_if_false_positive = update_types.StateUpdate()


I dont know why the false positive is so confusing to me but can we change it? state_update_if_command_failed if its only me we can leave as it

Discussed a bit in-person: It sounds like we're keeping the "false positive" terminology for now, but I'm definitely open to better ideas. We can change the terminology relatively easily before end of day Monday.

…sitive_state_update

SyntaxColoring · 2024-10-24T20:33:16Z

api/src/opentrons/protocol_engine/error_recovery_policy.py

+    CONTINUE_WITH_ERROR = enum.auto()
+    """Continue without interruption, carrying on from whatever error state the failed
+    command left the engine in.
+
+    This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`.
+    """


Any better names than CONTINUE_WITH_ERROR?

Would ASSUME_TRUE_POSITIVE_AND_CONTINUE be too silly?

i would say CONTINUE_FROM_RECOVERY maybe? I think CONTINUE_WITH_ERROR is ~fine though.

CONTINUE_WITH_ERROR makes more sense to me

SyntaxColoring · 2024-10-24T20:36:21Z

api/src/opentrons/protocol_engine/execution/tip_handler.py

+    def remove_tip(self, pipette_id: str) -> None:
+        """See documentation on abstract base class.
+
+        This should not be called when using virtual pipettes.
+        """
+        assert False, "TipHandler.remove_tip should not be used with virtual pipettes"
+


I'm not fully sure if this is assert is the right thing to do, but this is what we do for TipHandler.cache_tip(), so 🤷

SyntaxColoring · 2024-10-24T21:19:14Z

api/src/opentrons/protocol_engine/create_protocol_engine.py

This is kind of an unrelated refactor.

Protocol Engine has a bunch of injected dependencies that, themselves, need to be wired up to each other. This is usually straightforward, but it is, you know, code, and we can make mistakes. Especially when the dependencies' constructors do defaulting that can permit us to forget to wire something up.

The wire-up was split between this create_protocol_engine() helper and ProtocolEngine.__init__(), which was making me a little nervous. This moves most of it to create_protocol_engine().

sfoster1

Looks good to me!

sfoster1 · 2024-10-28T13:52:06Z

api/src/opentrons/protocol_engine/error_recovery_policy.py

+    CONTINUE_WITH_ERROR = enum.auto()
+    """Continue without interruption, carrying on from whatever error state the failed
+    command left the engine in.
+
+    This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`.
+    """


i would say CONTINUE_FROM_RECOVERY maybe? I think CONTINUE_WITH_ERROR is ~fine though.

TamarZanzouri · 2024-10-28T14:11:35Z

api/src/opentrons/protocol_engine/execution/error_recovery_hardware_state_synchronizer.py

+        self._hardware_api = hardware_api
+        self._state_view = state_view
+
+    def handle_action(self, action: Action) -> None:


TamarZanzouri

Very nice work! I love the refactor for the hardware synchronizer!

… action (#16601) Closes EXEC-791 See #16556 and the linked ticket for details concerning the reasoning for implementing these changes. When skipping a step in Error Recovery, the FE essentially calls two commands: handleIgnoringErrorKind, which deals with updating the recovery policy, and skipFailedCommand, which does the skipping. By using isAssumeFalsePositiveResumeKind, we can conditionally determine which policy ifMatch criteria to use, and we can determine which error recovery resume kind to dispatch to the server.

SyntaxColoring added 9 commits October 21, 2024 15:39

Make TipHandler.add_tip() non-async.

a99b012

Todo comment for unifying TipHandler.pick_up_tip() and TipHandler.add…

3cf5acf

…_tip().

Add TipHandler.remove_tip().

783d6af

Store state updates for recovering from false-positive errors.

4c4191c

Fix up state upon ProtocolEngine.resume_from_recovery().

36d54c3

Keep hardware API state in sync with Protocol Engine state.

4c3334f

Recover from dropTip false-positives.

a2b7628

Recover from pickUpTip false-positives.

b59f51c

Missing tests.

Todo comments.

da72776

sfoster1 reviewed Oct 22, 2024

View reviewed changes

SyntaxColoring mentioned this pull request Oct 22, 2024

feat(robot-server): HTTP API for "Ignore error and skip to next step" #16564

Merged

SyntaxColoring added 3 commits October 22, 2024 17:08

Add ErrorRecoveryType.ASSUME_FALSE_POSITIVE_AND_CONTINUE.

6e46203

Allow error recovery policies to auto-continue from false-positives.

d872bc2

Rename the existing internal ErrorRecoveryType value IGNORE_AND_CONTINUE to CONTINUE_WITH_ERROR to try to make room for the new value. (Leave the public HTTP API values alone.)

Test fixup.

6803390

TamarZanzouri reviewed Oct 23, 2024

View reviewed changes

SyntaxColoring added 9 commits October 23, 2024 15:27

Merge branch 'edge' into false_positive_state_update

4ccc92b

Refactor ProtocolEngine.__init__() for consistent dependency injection.

e37a589

Do hardware state fixups via action dispatch.

e64eded

Merge commit '80189200081610abd1d507f2b357da54747769df' into false_po…

d9d237c

…sitive_state_update

Merge fixup.

7fc9d08

Merge commit 'eb710c036b9576fabee093765f72c40681719976' into false_po…

aafc99d

…sitive_state_update

Various test fixups.

a3c0c35

Update command tests.

db850fa

Recover from false-positive errors in dropTipInPlace.

d3c28c9

mjhuff mentioned this pull request Oct 24, 2024

refactor(app): Support "resume from recovery assuming false positive" action #16601

Merged

Mucho linto.

e88c8c6

SyntaxColoring commented Oct 24, 2024

View reviewed changes

SyntaxColoring added 2 commits October 24, 2024 16:40

Slight docstring improvement.

28650c5

Add missing logs.

40bbcc5

Add missing test for get_state_update_for_false_positive().

2a4d086

SyntaxColoring marked this pull request as ready for review October 24, 2024 21:08

SyntaxColoring requested a review from a team as a code owner October 24, 2024 21:08

SyntaxColoring commented Oct 24, 2024

View reviewed changes

SyntaxColoring mentioned this pull request Oct 25, 2024

feat(api): Attach error recovery debug notes to commands #16608

Merged

1 task

sfoster1 approved these changes Oct 28, 2024

View reviewed changes

TamarZanzouri reviewed Oct 28, 2024

View reviewed changes

TamarZanzouri approved these changes Oct 28, 2024

View reviewed changes

SyntaxColoring merged commit 24fcc0d into edge Oct 28, 2024
22 checks passed

SyntaxColoring deleted the false_positive_state_update branch October 30, 2024 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

SyntaxColoring commented Oct 21, 2024 •

edited

Loading

sfoster1 left a comment

TamarZanzouri Oct 23, 2024 •

edited

Loading

SyntaxColoring Oct 24, 2024

SyntaxColoring Oct 24, 2024

sfoster1 Oct 28, 2024

TamarZanzouri Oct 28, 2024

SyntaxColoring Oct 24, 2024

SyntaxColoring Oct 24, 2024

sfoster1 left a comment

sfoster1 Oct 28, 2024

TamarZanzouri Oct 28, 2024

TamarZanzouri left a comment

feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

Conversation

SyntaxColoring commented Oct 21, 2024 • edited Loading

Overview

Changelog

Test plan

Review requests

Risk assessment

sfoster1 left a comment

Choose a reason for hiding this comment

TamarZanzouri Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfoster1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TamarZanzouri left a comment

Choose a reason for hiding this comment

SyntaxColoring commented Oct 21, 2024 •

edited

Loading

TamarZanzouri Oct 23, 2024 •

edited

Loading