-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hardware): Hardware error codes #13009
Conversation
Codecov Report
@@ Coverage Diff @@
## edge #13009 +/- ##
==========================================
+ Coverage 72.51% 72.66% +0.14%
==========================================
Files 2377 1526 -851
Lines 65495 49826 -15669
Branches 7269 3099 -4170
==========================================
- Hits 47495 36205 -11290
+ Misses 16269 13120 -3149
+ Partials 1731 501 -1230
Flags with carried forward coverage won't be shown. Click here to find out more.
|
bee9789
to
0fb2e36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general questions and i think we need to raise an extra TimeourError in the MoveScheduler
@@ -174,7 +176,14 @@ async def capacitive_probe( | |||
messenger, | |||
) | |||
if not threshold: | |||
raise RuntimeError("Could not set threshold for probe") | |||
raise CanbusCommunicationError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general question: We are raising the CommandTimedOutError
in the send threshold listener if the sensor scheduler didn't get a response back. Are we expecting to see both the CommandTimedOutError
and CanbusCommunicationError
if the listener does time out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you'd only see the timeout error since we're not catching and ignoring it. I think the only time you'd see this getting raised is if the microcontroller responded with threshold=0
raise MoveConditionNotMet | ||
elif self._error is not None: | ||
log.warning(f"Recoverable firmware error during {group_id}: {self._error}") | ||
raise MoveConditionNotMetError(detail={"group-id": str(group_id)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll need to raise MoveConditionNotMetError
there as we have already appended it toself._errors
in _handle_move_completed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this code won't run, right? because if self._errors
will run? I think this clause might not necessarily have to exist or at least be a weird generic catchall though, maybe it should just be a general error
@@ -528,9 +548,6 @@ async def run(self, can_messenger: CanMessenger) -> _Completions: | |||
log.warning( | |||
f"Expected nodes in group {str(group_id)}: {str(self._get_nodes_in_move_group(group_id))}" | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TimeoutError should be raised now (I got it in internal-release_0.13.0 but not edge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I don't want to add that in this PR - I want your commit in its entirety. So I think that either we can merge this now as is and re-add it after yours merges (and, we're going to have a lot of fixups for errors added since 0.13.0 diverged from edge, so it won't be unique) or wait on this PR until 0.13.0 gets merged back.
if self._errors: | ||
if len(self._errors) > 1: | ||
raise MotionFailedError( | ||
"Motion failed with multiple errors", wrapping=self._errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMAZING
== MessageId.error_message | ||
): | ||
await self._handle_error(build) | ||
handled = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we could get in a state where a listener accepts an error message but for whatever reason, it doesn't have an error handling method and hence not raise it?
In that case, would this error message just be thrown away since the can messenger thinks it was handled by a listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. All I can really say is, like, "don't do that"
# Log this separately if it's some unknown error | ||
log.exception(f"Unexpected error in CAN read task: {e}") | ||
continue | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What restarts this task now if we have an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh good catch I guess this PR was old enough that it missed those changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed @vegano1
@@ -60,7 +68,15 @@ async def find_dfu_device(pid: str, expected_device_count: int) -> str: | |||
if stdout is None and stderr is None: | |||
continue | |||
if stderr: | |||
raise RuntimeError(f"Error finding dfu device: {stderr.decode()}") | |||
raise BootloaderNotReady( | |||
USBTarget.rear_panel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this guaranteed to always be the rear panel?
maybe so, since its the only USB update target for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now yeah... I should really rework it to not do this though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
Is there a follow-up pr on the robot-server side to handle any of these errors getting propagated?
Nope, that one was already merged. All the structures to forward these errors should already be in place in robot server and protocol engine. |
7cd8a6d
to
dded2ad
Compare
This doesn't, like, do anything. It just blew up the reader task and nothing else, and we don't even do that anymore. It didn't go anywhere. Remove it.
5be5b16
to
38741fb
Compare
Now that we can express error codes in all our client interfaces, let's start raising them.
This PR makes the
opentrons_hardware
module use the new exceptions that have the new error codes. There's a couple new error codes, but mostly all this does is raise different exceptions in the same places.The two exception is in the can messenger and the movegrouprunner.
CAN Messenger
The CAN messenger has this asyncio task that reads stuff asynchronously from the bus and passes it to registered listeners. In addition, if a message is an error message it would raise an exception. But, it's a task, so that exception doesn't go anywhere. #12963 made this exception just logged and swallowed but, like... why does it exist? So we removed it.
MoveGroupRunner
A similar issue happens in MoveGroupRunner, where a background task accumulates responses and then pops them on over to the caller. This works the same now, but we can also move the move stop condition checking inside that loop. We can also now properly handle having multiple errors in a move group!
Note: You need to run
make setup-py
again since now hardware depends on shared-data.Review Requests
Testing