-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(api,update-server): Synchronize robot name more tightly between between api and update-server #11175
fix(api,update-server): Synchronize robot name more tightly between between api and update-server #11175
Conversation
Codecov Report
@@ Coverage Diff @@
## edge #11175 +/- ##
=======================================
Coverage 73.81% 73.81%
=======================================
Files 2086 2086
Lines 57722 57725 +3
Branches 5855 5855
=======================================
+ Hits 42609 42612 +3
Misses 13825 13825
Partials 1288 1288
Flags with carried forward coverage won't be shown. Click here to find out more.
|
# Strip the trailing newline, since it's not part of the actual name value. | ||
# TODO(mm, 2022-07-18): When we upgrade to systemd 249, use | ||
# `hostnamectl --json` for CLI output that we can parse more robustly. | ||
assert len(result) >= 1 and result[-1] == "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to assert that the last character is a new line? can we just trim white spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because as far as we're concerned in this part of the code, these are valid and distinct names:
"foo"
" foo"
"foo "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these valid names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of!
There are two components to whether or not a string is valid here:
- Input restrictions and output guarantees made by the 3rd-party commands
hostnamectl set-hostname --pretty
andhostnamectl status --pretty
. - Input restrictions and output guarantees that we add ourselves, in our own Python code.
But (1) is not well documented by the 3rd-party tools. And (2) happens in update-server
, far away from this code. So I think the most maintainable way to write this code is for it to assume that those strings can be valid, and to avoid chopping them up unnecessarily.
update-server/otupdate/common/name_management/pretty_hostname.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me! added a few remarks but other than that I feel good about approving this. Should I test this on a robot?
Testing this with the Opentrons App and my robot connected over Wi-Fi, I'm noticing weirdness where, after rename, the robot will disappear from the devices page and never reappear. I'm trying to discern whether that's purely the existing issue of #11101, or whether this PR somehow makes it worse. Edit: Based on further testing, I don't believe this PR somehow makes it worse. I think I was just seeing #11101. |
If a Discovery Client poll happens to line up with this period of inaccessibility, the robot will be marked as "unconnectable" (as opposed to "connectable" or "unreachable") while polls return Given that this will resolve itself, and that the rename itself will look like a brand new robot to the DC, this feels acceptable to me, but will likely require closer scrutiny once we track robots more reliably |
Overview
This PR fixes bugs that could cause a robot's name to be different depending on whether you were looking at
GET /server/update/health
orGET /health
.Closes #10413.
Changelog
update-server
, when retrieving the robot's pretty hostname, do not manually parse it from/etc/machine-info
. Instead, shell out tohostnamectl --pretty status
. This is to match howapi
has been doing it.hostnamectl --pretty status
, do not unnecessarily strip leading and trailing whitespace.update-server
, whenever we rewrite/etc/machine-info
to store a new name, also restartsystemd-hostnamed
.hostnamectl --pretty status
calls inupdate-server
andapi
to reliably pick up the new value."opentrons-develop"
if shelling out tohostnamectl
fails. Instead, just let the endpoint return500
.systemd-hostnamed
restart described above, and it seemed unused.Review requests
GET /health
andGET /server/health
disagree on the robot's name #10413."opentrons-develop"
to cause any problems on dev machines?make -C robot-server test
andmake -C robot-server dev
. I'm pretty sure we're okay.GET /health
,GET /server/update/health
, andGET /server/name
may now temporarily return a500
error if they're called within a few seconds of a rename. This is due to implementation limitations that I don't think are feasibly avoidable right now. See the comments in the code. Do we expect this to cause any robot discovery problems?Risk assessment
Medium.
There’s a risk that the temporary 500 after a name change will confuse
discovery-client
and screw up robot connectivity.There's also a risk that something not covered by CI is relying on the fallback name
opentrons-develop
. In that case,robot-server
'sGET /health
call will start failing with a 500 error.