Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api, app): Check Robot Deck Transform #5845

Merged
merged 14 commits into from
Jun 18, 2020
Merged

Conversation

Laura-Danielle
Copy link
Contributor

overview

This PR adds in a check during the health poll to determine whether a robot has a valid deck transformation or not.

The method will check whether a matrix is singular, the identity matrix, or out of the bounds greater than 3 for XY and [-20, -30] for the z value. All of these cases are rare/improbable, but if the user somehow has this data on their robot then they will definitely be experiencing major movement issues.

changelog

  • Add in a new entry for the health endpoint called valid_calibration (open to other names)
  • Add in a method in api.py that checks a few different properties/values of the matrix (see overview section)
  • Edit the app to display an error message iff valid_calibration is false

Note

  • Dev server will always display bad deck calibration because currently the identity matrix is used during simulation as a placeholder. This is also why the tavern test has to currently be set to false because the identity matrix is not valid. Generating a valid deck transform for simulation will be fixed in a subsequent PR.

review requests

Put this on your robot, turn your deck calibration into an identity matrix (easiest check), restart your robot then see the screen displayed here.

risk assessment

Medium? Not sure if we should even include this feature when we release deck calibration check. It might scare our users, even though it's checking only rare cases.

@Laura-Danielle Laura-Danielle added app Affects the `app` project api Affects the `api` project labels Jun 8, 2020
@Laura-Danielle Laura-Danielle requested review from a team June 8, 2020 22:01
@Laura-Danielle Laura-Danielle requested review from a team as code owners June 8, 2020 22:01
@Laura-Danielle Laura-Danielle requested review from ahiuchingau, mcous and amitlissack and removed request for a team June 8, 2020 22:01
@SyntaxColoring
Copy link
Contributor

SyntaxColoring commented Jun 8, 2020

or out of the bounds greater than 3 for XY

3 mm seems too tight to me. From my notes on #5022, one recent robot's clean-slate, healthy, good calibration had a y-offset of 4.99.

As things are, I don't know how we would find a sensible range for that check. We can see wild robots' calibrations through logging telemetry, but we can't trust that those are good.

@@ -36,10 +37,16 @@ const CALIBRATE_DECK_DESCRIPTION =

const CHECK_ROBOT_CAL_DESCRIPTION = "Check the robot's calibration state"

const ROBOT_CAL_ERROR = 'Bad robot calibration data detected'
const ROBOT_CAL_ERROR_DESCRIPTION =
'Do not run any protocols or check calibration as robot is likely to experience a crash.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that the robot calibration check looks at the tip pickup position and makes sure it's sane, specifically to guard against insane-matrix crashes like this? But maybe that's not good enough to stop singular or identity matrices from causing crashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few different check points for determining whether you calibration is bad or not. This check point is meant to tell a user that their deck calibration is so bad, they might not be able to even complete a calibration check without fixing their robot deck calibration. (I.e. limit switches triggering and such)

Copy link
Contributor Author

@Laura-Danielle Laura-Danielle Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or out of the bounds greater than 3 for XY

3 mm seems too tight to me. From my notes on #5022, one recent robot's clean-slate, healthy, good calibration had a y-offset of 4.99.

As things are, I don't know how we would find a sensible range for that check. We can see wild robots' calibrations through logging telemetry, but we can't trust that those are good.

Sorry don't know why github wouldn't let me directly comment above. 5mm is pretty large for a translation across the deck (which is purely meant to encapsulate tolerance ranges for the physical deck). I wonder if we need to use a different nominal value moving forward, or something else weird is happening.

I'm also happy to take this check out for now until we come up with a better way to determine something being crazy out of bounds (aside from singularity or identity)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few different check points for determining whether you calibration is bad or not. This check point is meant to tell a user that their deck calibration is so bad, they might not be able to even complete a calibration check without fixing their robot deck calibration. (I.e. limit switches triggering and such)

Dumb question, sorry, but, where do these messages show up? Is it:

  1. on a robot's page
  2. and, intended to prevent people from clicking the Check calibration button?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message for this PR shows up on the robot's page (currently under the check robot calibration button)

The tip rack pick up check happens during calibration check after a user has successfully picked up a tip. If a user goes into cal check w/ the use-cases from this PR they might experience a weird error (such as limit switch triggering) before ever seeing the message around them having bad deck calibration data.

I spoke to hardware about the values that you saw for a "working" deck calibration, and they are currently investigating. They should be able to give us a good idea of what the tolerancing should be for the deck and whether the nominal values we're currently using as a measurement should be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message for this PR shows up...currently under the check robot calibration button

If a user goes into cal check w/ the use-cases from this PR they might experience a weird error...before ever seeing the message around them having bad deck calibration data.

Ah, that makes sense.

In that case, I think we should think about a different UX.

Checking deck calibration should always be safe. We shouldn't have an inviting button, and then also have an error message competing with that button trying to convince you that the button is dangerous. We should just make the button not dangerous.

For example, we could duplicate this check at the beginning of the calibration check flow.

(Semantically, I don't see this condition as an error that prevents the robot from checking its calibration. I see it as the robot having already successfully diagnosed a problem with its calibration.)


I spoke to hardware about the values that you saw for a "working" deck calibration, and they are currently investigating. They should be able to give us a good idea of what the tolerancing should be for the deck and whether the nominal values we're currently using as a measurement should be changed.

Sounds great. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring I think we should just disable cal check when the error is there.

Copy link
Contributor

@SyntaxColoring SyntaxColoring Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring I think we should just disable cal check when the error is there.

We could do that. I think I softly prefer my suggestion, though.

If the calibration check is available unconditionally, it's simpler to point users to it. Otherwise, we might elicit confusion like, "My robot is moving wrong, and you Support people told me to click Check calibration to troubleshoot, but the software is telling me it can't do that...because it might make my robot move wrong?"

Copy link
Contributor

@mcous mcous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts below! TL;DR: I don't think this should be in the health endpoint.

Not sure if we should even include this feature when we release deck calibration check.

If we don't have clear UX goals with this feature, should we consider removing the UI from this PR, and instead trigger an analytics event if the data is bad? That way, we could have better data to decide if UI is necessary

def valid_transform(self) -> bool:
return self._valid_transform

def validate_calibration(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this method return a boolean? It seems a little complicated to have a call-site understand that it needs to trigger an update to a class property and then read that class property instead of just asking for the value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should fuse valid_transform and validate_transform into one method that returns its outcome (which i think maybe we should have be an enum that encompasses "singular", "identity", "bad", and "ok") and lru_cache it with invalidation on altering deck calibration

</LabeledButton>

{health && !health.valid_calibration && (
<div className={styles.cal_check_error_wrapper}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of presentational concerns to put into a component (ControlsCard) that is not currently presentational, and this logic isn't covered in ControlsCard.test.js.

I would create a new component for displaying the calibration warning text (and tests to ensure those warnings are displayed properly) and use that component here. If I was implementing it from the top down, I would do something like:

  1. Write a test for the ControlsCard that says "if there is a problem with the deck calibration data, a DeckCalibrationWarning (or whatever name) component is shown"
  2. Make a dummy DeckCalibrationWarning component and add the logic to ControlsCard to get that test to pass
  3. Add test(s) to check that DeckCalibrationWarning
    • Displays an "alert-circle" Icon
    • Has some title that mentions deck calibration
    • Has some description that also mentions deck calibration
  4. Implement a purely presentational DeckCalibrationError component that passes those tests

@@ -94,3 +94,25 @@
margin-right: auto;
}
}

.cal_check_error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for demo purposes for our meeting on Wednesday, instead of <div className={styles.cal_check_error}>, you could do:

// note: COLOR_ERROR would need to be added to components/src/styles/colors.js
import { Box, FONT_SIZE_BODY_1, COLOR_ERROR, SPACING_AUTO, SPACING_1 } from '@opentrons/components'

// ...
<Box
  color={COLOR_ERROR}
  fontSize={FONT_SIZE_BODY_1}
  marginRight={SPACING_AUTO}
  paddingRight={SPACING_1}
>

}


.cal_check_error_wrapper {
Copy link
Contributor

@mcous mcous Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of <div className={styles.cal_check_error_wrapper}>, this could be

import { Flex, ALIGN_CENTER } from '@opentrons/components'

// ...
<Flex alignItems={ALIGN_CENTER}>

@@ -37,7 +37,9 @@ async def get_health(
else:
max_supported = APIVersion(1, 0)

hardware.validate_calibration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to add this to the health endpoint? Health endpoints are for checking "is the server up" and should be as lightweight as possible. This introduces matrix math into the health endpoint, which is called a lot.

IMO this belongs in a calibration-specific GET endpoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the /health endpoint is and should be about health of the robot, of which "the server is up" is an inescapable part.

That said, you're right about this being too much work to do in a frequently-polled endpoint - we should lru_cache this function and invalidate it when the deck calibration changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bad deck calibration doesn't mean the robot is unhealthy, it means the calibration data needs to be changed. The robot can still be interacted with, and you could still run a protocol (it's a bad idea, but you could).

You could contrast the calibration data with the version numbers reported in the health endpoint, which are important in determining how / if the robot can be communicated successfully by a given client.

This data is really specific to the implementation of the calibration system. I strongly feel like it should be in a calibration-specific endpoint/resource. The health endpoint is part of the connectivity system, and I would like to avoid overloading it with other concerns.

@SyntaxColoring
Copy link
Contributor

If we don't have clear UX goals with this feature, should we consider removing the UI from this PR, and instead trigger an analytics event if the data is bad? That way, we could have better data to decide if UI is necessary

Maybe obvious, but: If we do this, I suggest including the robot's calibration data—ideally everything, but at least the deck calibration matrix—in the analytics. Otherwise, we just see a bunch of binary pass/fails, which is kind of unactionable.

def valid_transform(self) -> bool:
return self._valid_transform

def validate_calibration(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should fuse valid_transform and validate_transform into one method that returns its outcome (which i think maybe we should have be an enum that encompasses "singular", "identity", "bad", and "ok") and lru_cache it with invalidation on altering deck calibration

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline change requests, minor ones though

@@ -209,6 +211,53 @@ def is_simulator(self):
""" `True` if this is a simulator; `False` otherwise. """
return isinstance(self._backend, Simulator)

def validate_calibration(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely need a return type annotation here

return self._calculate_valid_calibration()

@lru_cache(maxsize=1)
def _calculate_valid_calibration(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a return type annotation


z = abs(curr_cal[2][-1])

outofrange = z < 20 or z > 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tolerance is a little tight for my comfort, especially since we can't reallytest this on as many robots as normal - could we bump it to (16, 34) or something? +-9 from the -25 nominal should still be within our 10mm safety bound for moving to the tiprack

@@ -253,6 +254,9 @@ def fw_version(self) -> Optional[str]:
async def update_fw_version(self):
pass

def validate_calibration(self):
return DeckTransformState.IDENTITY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this function for? It doesn't appear to be called

@@ -22,6 +22,11 @@ class Health(BaseModel):
description="The robot's name. In most cases the same as its "
"mDNS advertisement domain name but this can get out"
" of sync. Mostly useful for user-facing titles.")
calibration: str = \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really should be an enum type brought up from hardware so it shows up as such in the api docs, and the same in the app-side flow types

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tooltip is centered on the box containing the check button rather than the button itself
Screen Shot 2020-06-18 at 9 55 30 AM

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the default viewport size of 1024x768 the deck cal warning wraps extremely weirdly:
Screen Shot 2020-06-18 at 9 57 03 AM

Could we maybe make the Bad deck calibration! its own <p> or something?

@sfoster1
Copy link
Member

It does work great though!

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mcous
Copy link
Contributor

mcous commented Jun 18, 2020

I want to re-iterate my concerns with adding deck calibration data to the health endpoint. In addition to overloading the health endpoint with non-server-health concerns, it has a functional problem: by design, the app UI does not control when the health endpoint of a given robot is hit because it is part of the discover-client's connectivity check.

If you rely on the health endpoint being updated out of band (rather than using a specific calibration endpoint that can be hit when the UI needs that information), you run the risk of displaying stale data

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

* add /calibration/status endpoint + tavern test

* fill in instrumentCalibration and data matrix

* slight model changes
@Laura-Danielle Laura-Danielle merged commit ed67383 into edge Jun 18, 2020
@Laura-Danielle Laura-Danielle deleted the add_bad_deck_cal_status branch June 18, 2020 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Affects the `api` project app Affects the `app` project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants