Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556

Merged
merged 25 commits into from
Oct 28, 2024

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Oct 21, 2024

Overview

This does the interesting part of EXEC-676. See there for background.

Changelog

When a command encounters a defined error, it now enacts two state updates:

  1. The first one is what we had before. It represents all the changes that the command made—up to, and including, the point of failure. This is applied as soon as the command fails, as part of the FailCommandAction.
  2. The second one is new. It represents additional changes to make if this error turns out to be a false positive. This is not applied as part of the FailCommandAction. It's merely stored, and applied later, when (or if) a client tells us that the error was a false positive. I think of it as a deferred state update.

For example, if a pickUpTip fails with atipPhysicallyMissing error:

  1. A tip is logically consumed from the tip rack, but no tip is logically placed on the pipette. (These are the documented semantics of the tipPhysicallyMissing error.) This happens via state_update.
  2. But if a client selects "Ignore error and skip to next step," a tip is then logically placed on the pipette. You can then continue on to the rest of the protocol, and it should just work.

We unfortunately need to go out of our way to keep the hardware API's state in sync with Protocol Engine's state as we do this.

(All of the above is EXEC-785.)

We can use this to allow continuing fromtipPhysicallyMissing and tipPhysicallyAttached errors in pickUpTip, dropTip, and dropTipInPlace commands. (EXEC-778, EXEC-779.)

Test plan

Most easily tested with #16601.

  • Play around with inducing and recovering from errors. In particular, try continuing from failed tip pickups and tip drops.
  • Pay attention to path planning. For example, if you ignore and continue from a failed tip pickup, the pipette should move as if it does have a tip attached.
  • Make sure the "ignore all errors of this type" button works as advertised.
  • Make sure the older error recovery options ("retry", "cancel") still work.

Review requests

  • In general, any thoughts on the fundamental approach?
  • In general, any thoughts on naming?
  • See comments below for specific stuff.

Risk assessment

Medium or maybe high. The blast radius is confined to error recovery mode, but this does do weird stuff in bad ways.

Implementing this as Protocol Engine actions instead of Protocol Engine commands bypasses some of our usual safeties. We discussed this a little in #16564 (comment).

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hardware synchronizer is the only real way to do it and therefore is fine. In the long term I think we should pull more state out of the hardware controller; it's stuck around in the engine era for the sake of supporting older APIs and direct hardware controller usage, but it is probably time to split that stuff out into a compat layer.

I think a nice improvement would probably be to centralize not only the synchronizing code but also where it gets called, by having a HardwareState or something that eats the resume from recovery action instead of it being a side effect of the top level engine call. Those are always awful to deal with.

Rename the existing internal ErrorRecoveryType value IGNORE_AND_CONTINUE to CONTINUE_WITH_ERROR to try to make room for the new value. (Leave the public HTTP API values alone.)
@@ -146,7 +146,15 @@ async def execute(self, params: DropTipParams) -> _ExecuteReturn:
)
],
)
return DefinedErrorData(public=error, state_update=state_update)
state_update_if_false_positive = update_types.StateUpdate()
Copy link
Contributor

@TamarZanzouri TamarZanzouri Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know why the false positive is so confusing to me but can we change it? state_update_if_command_failed if its only me we can leave as it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed a bit in-person: It sounds like we're keeping the "false positive" terminology for now, but I'm definitely open to better ideas. We can change the terminology relatively easily before end of day Monday.

Comment on lines +31 to +36
CONTINUE_WITH_ERROR = enum.auto()
"""Continue without interruption, carrying on from whatever error state the failed
command left the engine in.

This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better names than CONTINUE_WITH_ERROR?

Would ASSUME_TRUE_POSITIVE_AND_CONTINUE be too silly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say CONTINUE_FROM_RECOVERY maybe? I think CONTINUE_WITH_ERROR is ~fine though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTINUE_WITH_ERROR makes more sense to me

Comment on lines +446 to +452
def remove_tip(self, pipette_id: str) -> None:
"""See documentation on abstract base class.

This should not be called when using virtual pipettes.
"""
assert False, "TipHandler.remove_tip should not be used with virtual pipettes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully sure if this is assert is the right thing to do, but this is what we do for TipHandler.cache_tip(), so 🤷

@SyntaxColoring SyntaxColoring marked this pull request as ready for review October 24, 2024 21:08
@SyntaxColoring SyntaxColoring requested a review from a team as a code owner October 24, 2024 21:08
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an unrelated refactor.

Protocol Engine has a bunch of injected dependencies that, themselves, need to be wired up to each other. This is usually straightforward, but it is, you know, code, and we can make mistakes. Especially when the dependencies' constructors do defaulting that can permit us to forget to wire something up.

The wire-up was split between this create_protocol_engine() helper and ProtocolEngine.__init__(), which was making me a little nervous. This moves most of it to create_protocol_engine().

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Comment on lines +31 to +36
CONTINUE_WITH_ERROR = enum.auto()
"""Continue without interruption, carrying on from whatever error state the failed
command left the engine in.

This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say CONTINUE_FROM_RECOVERY maybe? I think CONTINUE_WITH_ERROR is ~fine though.

self._hardware_api = hardware_api
self._state_view = state_view

def handle_action(self, action: Action) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOVELY!

Copy link
Contributor

@TamarZanzouri TamarZanzouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I love the refactor for the hardware synchronizer!

@SyntaxColoring SyntaxColoring merged commit 24fcc0d into edge Oct 28, 2024
22 checks passed
mjhuff added a commit that referenced this pull request Oct 28, 2024
… action (#16601)

Closes EXEC-791

See #16556 and the linked ticket for details concerning the reasoning for implementing these changes.

When skipping a step in Error Recovery, the FE essentially calls two commands: handleIgnoringErrorKind, which deals with updating the recovery policy, and skipFailedCommand, which does the skipping. By using isAssumeFalsePositiveResumeKind, we can conditionally determine which policy ifMatch criteria to use, and we can determine which error recovery resume kind to dispatch to the server.
@SyntaxColoring SyntaxColoring deleted the false_positive_state_update branch October 30, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants