Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(robot-server): Implement storage for "enable/disable error recovery" setting #16333

Merged
merged 9 commits into from
Sep 30, 2024

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Sep 23, 2024

Overview

This implements persistent storage for an "enable/disable error recovery" robot setting, as part of EXEC-719.

Test Plan and Hands on Testing

  • Use a dev server to double-check that our migration system will actually let me make this modification in-place.

Changelog

This adds the setting as a boolean enable/disable flag. Alternatively, we could have added persistent storage for a whole error recovery policy structure (as in, we could have stored the whole request structure of POST /runs/{id}/errorRecoveryPolicy), but that seemed like overkill to me.

I've implemented this as a single-purpose table in our SQL database, containing one row with one boolean column. This feels to me like a misuse of a relational database, but other people apparently think it's a normal thing to do sometimes? 🤷 The alternative would be to add a new JSON file to robot-server's persistence directory. After discussion with @TamarZanzouri, I've implemented this as a table mapping setting-name string keys to boolean values. The only key right now is "enable_error_recovery", and in the future we might add more keys for more settings / feature-flags.

Review requests

Any thoughts on the single-row table thing? See comments below.

Risk assessment

Some risk of breaking robots in the office currently running edge—see test plan.

@SyntaxColoring SyntaxColoring requested a review from a team as a code owner September 23, 2024 20:36
@SyntaxColoring SyntaxColoring requested a review from a team September 23, 2024 20:36
@SyntaxColoring SyntaxColoring changed the title Implement SQL storage for an "enable error recovery" scalar. feat(robot-server): Implement storage for "enable/disable error recovery" setting Sep 23, 2024
Copy link
Contributor

@TamarZanzouri TamarZanzouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the table to robot_setting table and add a settings table with an ER option. for now there will still be one row with a FK to the settings table but that will be easier to scale

robot-server/robot_server/persistence/tables/schema_7.py Outdated Show resolved Hide resolved
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noice

@SyntaxColoring SyntaxColoring marked this pull request as draft September 24, 2024 14:39
@SyntaxColoring SyntaxColoring marked this pull request as ready for review September 24, 2024 15:51
Comment on lines -33 to -49
# It should have no protocols.
# We really want to check that the server's persistence directory has been wiped
# clean, but testing that directly would rely on internal implementation details
# of file layout and tend to be brittle.
# As an approximation, just check that there are no protocols or runs left.
assert (await robot_client.get_protocols()).json()["data"] == []

# It should have no runs.
assert (await robot_client.get_runs()).json()["data"] == []

# There should be no files except for robot_server.db
# and an empty protocols/ directory.
all_files_and_directories = set(persistence_directory.glob("**/*"))
expected_files_and_directories = {
persistence_directory / "robot_server.db",
persistence_directory / "7",
persistence_directory / "7" / "protocols",
persistence_directory / "7" / "robot_server.db",
}
assert all_files_and_directories == expected_files_and_directories

Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the tests that check the filesystem to make sure that the protocol files, etc., have been removed. This test breaks every time we add a new persistence subdirectory or adjust the files inside it, and I don't personally find the test helpful these days.

Is this a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No in general, maybe yes for this PR. We really want this testing to exist because we have messed up integrations before. What I'd recommend is to figure out a way to not make it rely on brittle implementation details, by maybe presenting a contract about what files go where.

Comment on lines +55 to 58
# Subdirectory "7" was previously used on our edge branch for an in-dev
# schema that was never released to the public. It may be present on
# internal robots.
v6_to_v7.Migration6to7(subdirectory=LATEST_VERSION_DIRECTORY),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're abandoning subdirectory 7 and replacing the 6->7 migration with 6->7.1. We've talked about this before, but this is the first time we're actually doing it. The downside is that we'll lose the past few weeks of run history on the in-office robots that are on the edge branch.

Alternatives:

  • Add some kind of special step to robot server startup that modifies 7/robot_server.db in-place to add the new table. I didn't do this because I think it would set up a surprising precedent of modifying the database outside of a migration.
  • Keep subdirectory 7 as-is, add the new table to a new schema 8, and make users go through a 6->7->8 chain when we release this. I didn't do this because it would add ongoing performance and maintenance cost. Also, I don't want to set a precedent of edge-only DB changes like this requiring migrations, because I want to keep them as not-annoying as possible so people aren't scared to make improvements on edge.

Assuming we use the /settings HTTP endpoints for this, this matches the existing convention of the default, `null`, being equivalent to `false`. That convention is sort of built into the HTTP API: there is otherwise no way for a client to tell whether a value of `null` means the UI switch should be rendered as engaged or disengaged.
@SyntaxColoring
Copy link
Contributor Author

I've flipped this from enableErrorRecovery to disableErrorRecovery to match how all of the existing stuff in /settings works, where the default is null and the client interprets null as equivalent to false.

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me modulo figuring out a way to do the migration testing.

@SyntaxColoring SyntaxColoring merged commit 021d28e into edge Sep 30, 2024
7 checks passed
@SyntaxColoring SyntaxColoring deleted the enable_error_recovery_policy_setting_storage branch September 30, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants