-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(robot-server): Implement storage for "enable/disable error recovery" setting #16333
feat(robot-server): Implement storage for "enable/disable error recovery" setting #16333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change the table to robot_setting table and add a settings table with an ER option. for now there will still be one row with a FK to the settings table but that will be easier to scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noice
The 6->7 migration works based on copying whole files, so we don't have a trivial opportunity to change table names.
# It should have no protocols. | ||
# We really want to check that the server's persistence directory has been wiped | ||
# clean, but testing that directly would rely on internal implementation details | ||
# of file layout and tend to be brittle. | ||
# As an approximation, just check that there are no protocols or runs left. | ||
assert (await robot_client.get_protocols()).json()["data"] == [] | ||
|
||
# It should have no runs. | ||
assert (await robot_client.get_runs()).json()["data"] == [] | ||
|
||
# There should be no files except for robot_server.db | ||
# and an empty protocols/ directory. | ||
all_files_and_directories = set(persistence_directory.glob("**/*")) | ||
expected_files_and_directories = { | ||
persistence_directory / "robot_server.db", | ||
persistence_directory / "7", | ||
persistence_directory / "7" / "protocols", | ||
persistence_directory / "7" / "robot_server.db", | ||
} | ||
assert all_files_and_directories == expected_files_and_directories | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the tests that check the filesystem to make sure that the protocol files, etc., have been removed. This test breaks every time we add a new persistence subdirectory or adjust the files inside it, and I don't personally find the test helpful these days.
Is this a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No in general, maybe yes for this PR. We really want this testing to exist because we have messed up integrations before. What I'd recommend is to figure out a way to not make it rely on brittle implementation details, by maybe presenting a contract about what files go where.
# Subdirectory "7" was previously used on our edge branch for an in-dev | ||
# schema that was never released to the public. It may be present on | ||
# internal robots. | ||
v6_to_v7.Migration6to7(subdirectory=LATEST_VERSION_DIRECTORY), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're abandoning subdirectory 7 and replacing the 6->7 migration with 6->7.1. We've talked about this before, but this is the first time we're actually doing it. The downside is that we'll lose the past few weeks of run history on the in-office robots that are on the edge
branch.
Alternatives:
- Add some kind of special step to robot server startup that modifies
7/robot_server.db
in-place to add the new table. I didn't do this because I think it would set up a surprising precedent of modifying the database outside of a migration. - Keep subdirectory 7 as-is, add the new table to a new schema 8, and make users go through a 6->7->8 chain when we release this. I didn't do this because it would add ongoing performance and maintenance cost. Also, I don't want to set a precedent of
edge
-only DB changes like this requiring migrations, because I want to keep them as not-annoying as possible so people aren't scared to make improvements onedge
.
Assuming we use the /settings HTTP endpoints for this, this matches the existing convention of the default, `null`, being equivalent to `false`. That convention is sort of built into the HTTP API: there is otherwise no way for a client to tell whether a value of `null` means the UI switch should be rendered as engaged or disengaged.
I've flipped this from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me modulo figuring out a way to do the migration testing.
Overview
This implements persistent storage for an "enable/disable error recovery" robot setting, as part of EXEC-719.
Test Plan and Hands on Testing
Use a dev server to double-check that our migration system will actually let me make this modification in-place.Changelog
This adds the setting as a boolean enable/disable flag. Alternatively, we could have added persistent storage for a whole error recovery policy structure (as in, we could have stored the whole request structure of
POST /runs/{id}/errorRecoveryPolicy
), but that seemed like overkill to me.I've implemented this as a single-purpose table in our SQL database, containing one row with one boolean column. This feels to me like a misuse of a relational database, but other people apparently think it's a normal thing to do sometimes? 🤷 The alternative would be to add a new JSON file to robot-server's persistence directory.After discussion with @TamarZanzouri, I've implemented this as a table mapping setting-name string keys to boolean values. The only key right now is"enable_error_recovery"
, and in the future we might add more keys for more settings / feature-flags.Review requests
Any thoughts on the single-row table thing?See comments below.Risk assessment
Some risk of breaking robots in the office currently running
edge
—see test plan.