-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api, app): Check Robot Deck Transform #5845
Conversation
3 mm seems too tight to me. From my notes on #5022, one recent robot's clean-slate, healthy, good calibration had a y-offset of As things are, I don't know how we would find a sensible range for that check. We can see wild robots' calibrations through logging telemetry, but we can't trust that those are good. |
@@ -36,10 +37,16 @@ const CALIBRATE_DECK_DESCRIPTION = | |||
|
|||
const CHECK_ROBOT_CAL_DESCRIPTION = "Check the robot's calibration state" | |||
|
|||
const ROBOT_CAL_ERROR = 'Bad robot calibration data detected' | |||
const ROBOT_CAL_ERROR_DESCRIPTION = | |||
'Do not run any protocols or check calibration as robot is likely to experience a crash.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that the robot calibration check looks at the tip pickup position and makes sure it's sane, specifically to guard against insane-matrix crashes like this? But maybe that's not good enough to stop singular or identity matrices from causing crashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few different check points for determining whether you calibration is bad or not. This check point is meant to tell a user that their deck calibration is so bad, they might not be able to even complete a calibration check without fixing their robot deck calibration. (I.e. limit switches triggering and such)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or out of the bounds greater than 3 for XY
3 mm seems too tight to me. From my notes on #5022, one recent robot's clean-slate, healthy, good calibration had a y-offset of
4.99
.As things are, I don't know how we would find a sensible range for that check. We can see wild robots' calibrations through logging telemetry, but we can't trust that those are good.
Sorry don't know why github wouldn't let me directly comment above. 5mm is pretty large for a translation across the deck (which is purely meant to encapsulate tolerance ranges for the physical deck). I wonder if we need to use a different nominal value moving forward, or something else weird is happening.
I'm also happy to take this check out for now until we come up with a better way to determine something being crazy out of bounds (aside from singularity or identity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few different check points for determining whether you calibration is bad or not. This check point is meant to tell a user that their deck calibration is so bad, they might not be able to even complete a calibration check without fixing their robot deck calibration. (I.e. limit switches triggering and such)
Dumb question, sorry, but, where do these messages show up? Is it:
- on a robot's page
- and, intended to prevent people from clicking the Check calibration button?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message for this PR shows up on the robot's page (currently under the check robot calibration button)
The tip rack pick up check happens during calibration check after a user has successfully picked up a tip. If a user goes into cal check w/ the use-cases from this PR they might experience a weird error (such as limit switch triggering) before ever seeing the message around them having bad deck calibration data.
I spoke to hardware about the values that you saw for a "working" deck calibration, and they are currently investigating. They should be able to give us a good idea of what the tolerancing should be for the deck and whether the nominal values we're currently using as a measurement should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message for this PR shows up...currently under the check robot calibration button
If a user goes into cal check w/ the use-cases from this PR they might experience a weird error...before ever seeing the message around them having bad deck calibration data.
Ah, that makes sense.
In that case, I think we should think about a different UX.
Checking deck calibration should always be safe. We shouldn't have an inviting button, and then also have an error message competing with that button trying to convince you that the button is dangerous. We should just make the button not dangerous.
For example, we could duplicate this check at the beginning of the calibration check flow.
(Semantically, I don't see this condition as an error that prevents the robot from checking its calibration. I see it as the robot having already successfully diagnosed a problem with its calibration.)
I spoke to hardware about the values that you saw for a "working" deck calibration, and they are currently investigating. They should be able to give us a good idea of what the tolerancing should be for the deck and whether the nominal values we're currently using as a measurement should be changed.
Sounds great. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SyntaxColoring I think we should just disable cal check when the error is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SyntaxColoring I think we should just disable cal check when the error is there.
We could do that. I think I softly prefer my suggestion, though.
If the calibration check is available unconditionally, it's simpler to point users to it. Otherwise, we might elicit confusion like, "My robot is moving wrong, and you Support people told me to click Check calibration to troubleshoot, but the software is telling me it can't do that...because it might make my robot move wrong?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts below! TL;DR: I don't think this should be in the health endpoint.
Not sure if we should even include this feature when we release deck calibration check.
If we don't have clear UX goals with this feature, should we consider removing the UI from this PR, and instead trigger an analytics event if the data is bad? That way, we could have better data to decide if UI is necessary
def valid_transform(self) -> bool: | ||
return self._valid_transform | ||
|
||
def validate_calibration(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this method return a boolean? It seems a little complicated to have a call-site understand that it needs to trigger an update to a class property and then read that class property instead of just asking for the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should fuse valid_transform
and validate_transform
into one method that returns its outcome (which i think maybe we should have be an enum that encompasses "singular", "identity", "bad", and "ok") and lru_cache
it with invalidation on altering deck calibration
</LabeledButton> | ||
|
||
{health && !health.valid_calibration && ( | ||
<div className={styles.cal_check_error_wrapper}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of presentational concerns to put into a component (ControlsCard
) that is not currently presentational, and this logic isn't covered in ControlsCard.test.js
.
I would create a new component for displaying the calibration warning text (and tests to ensure those warnings are displayed properly) and use that component here. If I was implementing it from the top down, I would do something like:
- Write a test for the
ControlsCard
that says "if there is a problem with the deck calibration data, aDeckCalibrationWarning
(or whatever name) component is shown" - Make a dummy
DeckCalibrationWarning
component and add the logic to ControlsCard to get that test to pass - Add test(s) to check that
DeckCalibrationWarning
- Displays an "alert-circle"
Icon
- Has some title that mentions deck calibration
- Has some description that also mentions deck calibration
- Displays an "alert-circle"
- Implement a purely presentational
DeckCalibrationError
component that passes those tests
@@ -94,3 +94,25 @@ | |||
margin-right: auto; | |||
} | |||
} | |||
|
|||
.cal_check_error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly for demo purposes for our meeting on Wednesday, instead of <div className={styles.cal_check_error}>
, you could do:
// note: COLOR_ERROR would need to be added to components/src/styles/colors.js
import { Box, FONT_SIZE_BODY_1, COLOR_ERROR, SPACING_AUTO, SPACING_1 } from '@opentrons/components'
// ...
<Box
color={COLOR_ERROR}
fontSize={FONT_SIZE_BODY_1}
marginRight={SPACING_AUTO}
paddingRight={SPACING_1}
>
} | ||
|
||
|
||
.cal_check_error_wrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of <div className={styles.cal_check_error_wrapper}>
, this could be
import { Flex, ALIGN_CENTER } from '@opentrons/components'
// ...
<Flex alignItems={ALIGN_CENTER}>
@@ -37,7 +37,9 @@ async def get_health( | |||
else: | |||
max_supported = APIVersion(1, 0) | |||
|
|||
hardware.validate_calibration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to add this to the health endpoint? Health endpoints are for checking "is the server up" and should be as lightweight as possible. This introduces matrix math into the health endpoint, which is called a lot.
IMO this belongs in a calibration-specific GET endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the /health
endpoint is and should be about health of the robot, of which "the server is up" is an inescapable part.
That said, you're right about this being too much work to do in a frequently-polled endpoint - we should lru_cache this function and invalidate it when the deck calibration changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bad deck calibration doesn't mean the robot is unhealthy, it means the calibration data needs to be changed. The robot can still be interacted with, and you could still run a protocol (it's a bad idea, but you could).
You could contrast the calibration data with the version numbers reported in the health endpoint, which are important in determining how / if the robot can be communicated successfully by a given client.
This data is really specific to the implementation of the calibration system. I strongly feel like it should be in a calibration-specific endpoint/resource. The health endpoint is part of the connectivity system, and I would like to avoid overloading it with other concerns.
Maybe obvious, but: If we do this, I suggest including the robot's calibration data—ideally everything, but at least the deck calibration matrix—in the analytics. Otherwise, we just see a bunch of binary pass/fails, which is kind of unactionable. |
def valid_transform(self) -> bool: | ||
return self._valid_transform | ||
|
||
def validate_calibration(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should fuse valid_transform
and validate_transform
into one method that returns its outcome (which i think maybe we should have be an enum that encompasses "singular", "identity", "bad", and "ok") and lru_cache
it with invalidation on altering deck calibration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline change requests, minor ones though
@@ -209,6 +211,53 @@ def is_simulator(self): | |||
""" `True` if this is a simulator; `False` otherwise. """ | |||
return isinstance(self._backend, Simulator) | |||
|
|||
def validate_calibration(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely need a return type annotation here
return self._calculate_valid_calibration() | ||
|
||
@lru_cache(maxsize=1) | ||
def _calculate_valid_calibration(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a return type annotation
|
||
z = abs(curr_cal[2][-1]) | ||
|
||
outofrange = z < 20 or z > 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this tolerance is a little tight for my comfort, especially since we can't reallytest this on as many robots as normal - could we bump it to (16, 34) or something? +-9 from the -25 nominal should still be within our 10mm safety bound for moving to the tiprack
@@ -253,6 +254,9 @@ def fw_version(self) -> Optional[str]: | |||
async def update_fw_version(self): | |||
pass | |||
|
|||
def validate_calibration(self): | |||
return DeckTransformState.IDENTITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this function for? It doesn't appear to be called
@@ -22,6 +22,11 @@ class Health(BaseModel): | |||
description="The robot's name. In most cases the same as its " | |||
"mDNS advertisement domain name but this can get out" | |||
" of sync. Mostly useful for user-facing titles.") | |||
calibration: str = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this really should be an enum type brought up from hardware so it shows up as such in the api docs, and the same in the app-side flow types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work great though! |
0d5bdec
to
625f433
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I want to re-iterate my concerns with adding deck calibration data to the health endpoint. In addition to overloading the health endpoint with non-server-health concerns, it has a functional problem: by design, the app UI does not control when the health endpoint of a given robot is hit because it is part of the discover-client's connectivity check. If you rely on the health endpoint being updated out of band (rather than using a specific calibration endpoint that can be hit when the UI needs that information), you run the risk of displaying stale data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* add /calibration/status endpoint + tavern test * fill in instrumentCalibration and data matrix * slight model changes
overview
This PR adds in a check during the health poll to determine whether a robot has a valid deck transformation or not.
The method will check whether a matrix is singular, the identity matrix, or out of the bounds greater than 3 for XY and [-20, -30] for the z value. All of these cases are rare/improbable, but if the user somehow has this data on their robot then they will definitely be experiencing major movement issues.
changelog
valid_calibration
(open to other names)api.py
that checks a few different properties/values of the matrix (see overview section)valid_calibration
is falseNote
review requests
Put this on your robot, turn your deck calibration into an identity matrix (easiest check), restart your robot then see the screen displayed here.
risk assessment
Medium? Not sure if we should even include this feature when we release deck calibration check. It might scare our users, even though it's checking only rare cases.