feat(hardware): Hardware error codes #13009

sfoster1 · 2023-06-28T20:58:40Z

Now that we can express error codes in all our client interfaces, let's start raising them.

This PR makes the opentrons_hardware module use the new exceptions that have the new error codes. There's a couple new error codes, but mostly all this does is raise different exceptions in the same places.

The two exception is in the can messenger and the movegrouprunner.

CAN Messenger

The CAN messenger has this asyncio task that reads stuff asynchronously from the bus and passes it to registered listeners. In addition, if a message is an error message it would raise an exception. But, it's a task, so that exception doesn't go anywhere. #12963 made this exception just logged and swallowed but, like... why does it exist? So we removed it.

MoveGroupRunner

A similar issue happens in MoveGroupRunner, where a background task accumulates responses and then pops them on over to the caller. This works the same now, but we can also move the move stop condition checking inside that loop. We can also now properly handle having multiple errors in a move group!

Note: You need to run make setup-py again since now hardware depends on shared-data.

Review Requests

Give a skim over most stuff
Look more in-depth at MoveGroupRunner and at CANMessenger

Testing

Cause some errors and maek sure it doesn't break in new ways.

codecov · 2023-06-28T21:01:01Z

Codecov Report

Merging #13009 (38741fb) into edge (ce24501) will increase coverage by 0.14%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             edge   #13009      +/-   ##
==========================================
+ Coverage   72.51%   72.66%   +0.14%     
==========================================
  Files        2377     1526     -851     
  Lines       65495    49826   -15669     
  Branches     7269     3099    -4170     
==========================================
- Hits        47495    36205   -11290     
+ Misses      16269    13120    -3149     
+ Partials     1731      501    -1230

Flag	Coverage Δ
app	`43.55% <ø> (-27.75%)`	⬇️
protocol-designer	`45.84% <ø> (ø)`
shared-data	`77.34% <75.00%> (+0.93%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
api/src/opentrons/hardware_control/ot3api.py	`80.10% <ø> (ø)`
...pentrons_hardware/drivers/binary_usb/bin_serial.py	`63.52% <ø> (-1.31%)`	⬇️
...ns_hardware/drivers/binary_usb/binary_messenger.py	`82.40% <ø> (+6.87%)`	⬆️
...pentrons_hardware/drivers/can_bus/can_messenger.py	`88.29% <ø> (+3.36%)`	⬆️
...dware/opentrons_hardware/drivers/can_bus/driver.py	`80.43% <ø> (-0.82%)`	⬇️
...are/opentrons_hardware/drivers/can_bus/settings.py	`100.00% <ø> (+4.28%)`	⬆️
...pentrons_hardware/drivers/can_bus/socket_driver.py	`48.57% <ø> (+1.20%)`	⬆️
...ns_hardware/firmware_bindings/messages/payloads.py	`96.09% <ø> (+0.66%)`	⬆️
...are/firmware_bindings/utils/binary_serializable.py	`98.64% <ø> (+2.26%)`	⬆️
...e/opentrons_hardware/firmware_update/downloader.py	`100.00% <ø> (ø)`
... and 18 more

... and 863 files with indirect coverage changes

ahiuchingau

Some general questions and i think we need to raise an extra TimeourError in the MoveScheduler

ahiuchingau · 2023-06-30T14:32:05Z

hardware/opentrons_hardware/hardware_control/tool_sensors.py

@@ -174,7 +176,14 @@ async def capacitive_probe(
        messenger,
    )
    if not threshold:
-        raise RuntimeError("Could not set threshold for probe")
+        raise CanbusCommunicationError(


Just a general question: We are raising the CommandTimedOutError in the send threshold listener if the sensor scheduler didn't get a response back. Are we expecting to see both the CommandTimedOutError and CanbusCommunicationError if the listener does time out?

No, you'd only see the timeout error since we're not catching and ignoring it. I think the only time you'd see this getting raised is if the microcontroller responded with threshold=0

ahiuchingau · 2023-06-30T14:37:48Z

hardware/opentrons_hardware/hardware_control/move_group_runner.py

-                raise MoveConditionNotMet
-        elif self._error is not None:
-            log.warning(f"Recoverable firmware error during {group_id}: {self._error}")
+                raise MoveConditionNotMetError(detail={"group-id": str(group_id)})


I don't think we'll need to raise MoveConditionNotMetError there as we have already appended it toself._errors in _handle_move_completed()

Then this code won't run, right? because if self._errors will run? I think this clause might not necessarily have to exist or at least be a weird generic catchall though, maybe it should just be a general error

ahiuchingau · 2023-06-30T14:40:12Z

hardware/opentrons_hardware/hardware_control/move_group_runner.py

@@ -528,9 +548,6 @@ async def run(self, can_messenger: CanMessenger) -> _Completions:
                log.warning(
                    f"Expected nodes in group {str(group_id)}: {str(self._get_nodes_in_move_group(group_id))}"
                )


This TimeoutError should be raised now (I got it in internal-release_0.13.0 but not edge)

I think that I don't want to add that in this PR - I want your commit in its entirety. So I think that either we can merge this now as is and re-add it after yours merges (and, we're going to have a lot of fixups for errors added since 0.13.0 diverged from edge, so it won't be unique) or wait on this PR until 0.13.0 gets merged back.

ahiuchingau · 2023-06-30T14:46:00Z

hardware/opentrons_hardware/hardware_control/move_group_runner.py

+            if self._errors:
+                if len(self._errors) > 1:
+                    raise MotionFailedError(
+                        "Motion failed with multiple errors", wrapping=self._errors


ahiuchingau · 2023-06-30T15:04:52Z

hardware/opentrons_hardware/drivers/can_bus/can_messenger.py

-                        == MessageId.error_message
-                    ):
-                        await self._handle_error(build)
+                        handled = True


Is it possible that we could get in a state where a listener accepts an error message but for whatever reason, it doesn't have an error handling method and hence not raise it?

In that case, would this error message just be thrown away since the can messenger thinks it was handled by a listener?

Yup. All I can really say is, like, "don't do that"

vegano1 · 2023-07-05T13:32:38Z

hardware/opentrons_hardware/drivers/can_bus/can_messenger.py

-                # Log this separately if it's some unknown error
-                log.exception(f"Unexpected error in CAN read task: {e}")
-                continue
+        try:


What restarts this task now if we have an exception?

Huh good catch I guess this PR was old enough that it missed those changes

Should be fixed @vegano1

vegano1 · 2023-07-05T14:07:06Z

hardware/opentrons_hardware/firmware_update/run.py

@@ -60,7 +68,15 @@ async def find_dfu_device(pid: str, expected_device_count: int) -> str:
        if stdout is None and stderr is None:
            continue
        if stderr:
-            raise RuntimeError(f"Error finding dfu device: {stderr.decode()}")
+            raise BootloaderNotReady(
+                USBTarget.rear_panel,


is this guaranteed to always be the rear panel?
maybe so, since its the only USB update target for now

Right now yeah... I should really rework it to not do this though

vegano1

This is great!
Is there a follow-up pr on the robot-server side to handle any of these errors getting propagated?

sfoster1 · 2023-07-06T16:59:21Z

This is great! Is there a follow-up pr on the robot-server side to handle any of these errors getting propagated?

Nope, that one was already merged. All the structures to forward these errors should already be in place in robot server and protocol engine.

This doesn't, like, do anything. It just blew up the reader task and nothing else, and we don't even do that anymore. It didn't go anywhere. Remove it.

sfoster1 requested review from a team as code owners June 28, 2023 20:58

sfoster1 force-pushed the hardware-error-codes branch from bee9789 to 0fb2e36 Compare June 28, 2023 21:02

sfoster1 requested a review from a team June 29, 2023 15:30

ahiuchingau requested changes Jun 30, 2023

View reviewed changes

vegano1 reviewed Jul 5, 2023

View reviewed changes

vegano1 approved these changes Jul 5, 2023

View reviewed changes

sfoster1 force-pushed the hardware-error-codes branch from 7cd8a6d to dded2ad Compare July 6, 2023 18:29

sfoster1 added 19 commits July 10, 2023 14:00

feat(hardware): add shared-data for errors

3575767

add internal message format error

69c6488

use some errors in hardware

d6e045e

fix message format

5f2673f

add configuration error

7eb83c5

use configuration error

389d221

add a bus error for e.g. error frames

125c28b

use bus error instead of hw error

3ff490e

all can errors except async hardware

6cdd153

add some new lovely error codes

66ba34d

use some more error codes

96b141c

add a move condition error

6d455d7

add handling for error messages from hardware

cf88046

remove asynchardwareerror

51c460b

This doesn't, like, do anything. It just blew up the reader task and nothing else, and we don't even do that anymore. It didn't go anywhere. Remove it.

fix tests

127ac55

shared-data format

32b5dc8

fix up api

c6b828d

fix accidental change to can messenger

c82248a

lint

06502ed

rebase fixups

38741fb

sfoster1 force-pushed the hardware-error-codes branch from 5be5b16 to 38741fb Compare July 10, 2023 18:24

ahiuchingau approved these changes Jul 10, 2023

View reviewed changes

sfoster1 merged commit b411304 into edge Jul 10, 2023

sfoster1 deleted the hardware-error-codes branch July 10, 2023 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hardware): Hardware error codes #13009

feat(hardware): Hardware error codes #13009

sfoster1 commented Jun 28, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading

ahiuchingau left a comment

ahiuchingau Jun 30, 2023

sfoster1 Jun 30, 2023

ahiuchingau Jun 30, 2023

sfoster1 Jun 30, 2023

ahiuchingau Jun 30, 2023

sfoster1 Jun 30, 2023

ahiuchingau Jun 30, 2023

ahiuchingau Jun 30, 2023

sfoster1 Jun 30, 2023

vegano1 Jul 5, 2023

sfoster1 Jul 6, 2023

sfoster1 Jul 6, 2023

vegano1 Jul 5, 2023

sfoster1 Jul 6, 2023

vegano1 left a comment

sfoster1 commented Jul 6, 2023

feat(hardware): Hardware error codes #13009

feat(hardware): Hardware error codes #13009

Conversation

sfoster1 commented Jun 28, 2023 • edited Loading

CAN Messenger

MoveGroupRunner

Review Requests

Testing

codecov bot commented Jun 28, 2023 • edited Loading

Codecov Report

ahiuchingau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vegano1 left a comment

Choose a reason for hiding this comment

sfoster1 commented Jul 6, 2023

sfoster1 commented Jun 28, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading