-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api): Allow treating errors as false-positives (ignore them and continue with the run) #16556
Conversation
Missing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hardware synchronizer is the only real way to do it and therefore is fine. In the long term I think we should pull more state out of the hardware controller; it's stuck around in the engine era for the sake of supporting older APIs and direct hardware controller usage, but it is probably time to split that stuff out into a compat layer.
I think a nice improvement would probably be to centralize not only the synchronizing code but also where it gets called, by having a HardwareState
or something that eats the resume from recovery action instead of it being a side effect of the top level engine call. Those are always awful to deal with.
Rename the existing internal ErrorRecoveryType value IGNORE_AND_CONTINUE to CONTINUE_WITH_ERROR to try to make room for the new value. (Leave the public HTTP API values alone.)
@@ -146,7 +146,15 @@ async def execute(self, params: DropTipParams) -> _ExecuteReturn: | |||
) | |||
], | |||
) | |||
return DefinedErrorData(public=error, state_update=state_update) | |||
state_update_if_false_positive = update_types.StateUpdate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know why the false positive is so confusing to me but can we change it? state_update_if_command_failed
if its only me we can leave as it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed a bit in-person: It sounds like we're keeping the "false positive" terminology for now, but I'm definitely open to better ideas. We can change the terminology relatively easily before end of day Monday.
…sitive_state_update
…sitive_state_update
CONTINUE_WITH_ERROR = enum.auto() | ||
"""Continue without interruption, carrying on from whatever error state the failed | ||
command left the engine in. | ||
|
||
This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any better names than CONTINUE_WITH_ERROR
?
Would ASSUME_TRUE_POSITIVE_AND_CONTINUE
be too silly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say CONTINUE_FROM_RECOVERY
maybe? I think CONTINUE_WITH_ERROR
is ~fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTINUE_WITH_ERROR
makes more sense to me
def remove_tip(self, pipette_id: str) -> None: | ||
"""See documentation on abstract base class. | ||
|
||
This should not be called when using virtual pipettes. | ||
""" | ||
assert False, "TipHandler.remove_tip should not be used with virtual pipettes" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully sure if this is assert
is the right thing to do, but this is what we do for TipHandler.cache_tip()
, so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of an unrelated refactor.
Protocol Engine has a bunch of injected dependencies that, themselves, need to be wired up to each other. This is usually straightforward, but it is, you know, code, and we can make mistakes. Especially when the dependencies' constructors do defaulting that can permit us to forget to wire something up.
The wire-up was split between this create_protocol_engine()
helper and ProtocolEngine.__init__()
, which was making me a little nervous. This moves most of it to create_protocol_engine()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
CONTINUE_WITH_ERROR = enum.auto() | ||
"""Continue without interruption, carrying on from whatever error state the failed | ||
command left the engine in. | ||
|
||
This is like `ProtocolEngine.resume_from_recovery(reconcile_false_positive=False)`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say CONTINUE_FROM_RECOVERY
maybe? I think CONTINUE_WITH_ERROR
is ~fine though.
self._hardware_api = hardware_api | ||
self._state_view = state_view | ||
|
||
def handle_action(self, action: Action) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOVELY!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! I love the refactor for the hardware synchronizer!
… action (#16601) Closes EXEC-791 See #16556 and the linked ticket for details concerning the reasoning for implementing these changes. When skipping a step in Error Recovery, the FE essentially calls two commands: handleIgnoringErrorKind, which deals with updating the recovery policy, and skipFailedCommand, which does the skipping. By using isAssumeFalsePositiveResumeKind, we can conditionally determine which policy ifMatch criteria to use, and we can determine which error recovery resume kind to dispatch to the server.
Overview
This does the interesting part of EXEC-676. See there for background.
Changelog
When a command encounters a defined error, it now enacts two state updates:
FailCommandAction
.FailCommandAction
. It's merely stored, and applied later, when (or if) a client tells us that the error was a false positive. I think of it as a deferred state update.For example, if a
pickUpTip
fails with atipPhysicallyMissing
error:tipPhysicallyMissing
error.) This happens viastate_update
.We unfortunately need to go out of our way to keep the hardware API's state in sync with Protocol Engine's state as we do this.
(All of the above is EXEC-785.)
We can use this to allow continuing from
tipPhysicallyMissing
andtipPhysicallyAttached
errors inpickUpTip
,dropTip
, anddropTipInPlace
commands. (EXEC-778, EXEC-779.)Test plan
Most easily tested with #16601.
Review requests
Risk assessment
Medium or maybe high. The blast radius is confined to error recovery mode, but this does do weird stuff in bad ways.
Implementing this as Protocol Engine actions instead of Protocol Engine commands bypasses some of our usual safeties. We discussed this a little in #16564 (comment).