-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(robot-server,system-server): Make SQL transactions behave sanely #13424
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## chore_release-7.0.0 #13424 +/- ##
=======================================================
+ Coverage 71.56% 71.68% +0.12%
=======================================================
Files 2430 2427 -3
Lines 67751 67569 -182
Branches 7846 7783 -63
=======================================================
- Hits 48486 48437 -49
+ Misses 17426 17306 -120
+ Partials 1839 1826 -13
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lmao gross. Looks like a good fix to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice
Discussed on a call with the RSS team, got 👍s. |
…13424) * Use SQLAlchemy's workarounds to fix transactions. * Deduplicate with system-server.
Overview
This fixes a bug (unticketed) where SQL transactions would not actually be transactional.👌
One effect of this was that if the process were killed in the middle of a long database migration, it would leave the database in a broken half-migrated state.
Technical details
Between SQLAlchemy and the underlying SQLite database, there is a middle layer: Python's built-in
sqlite3
driver, akapysqlite
. According to this note from the SQLAlchemy docs, there are known issues with it:SERIALIZABLE
isolation and transactional DDL are especially important for us.SERIALIZABLE
isolation is just the sane transaction behavior that everybody intuitively expects. Transactional DDL is the ability to do database migrations as one long transaction, rolling back the whole thing in case any part of it fails.Fortunately, the good folks at SQLAlchemy have provided some magic code snippets to fix it the problems. So:
robot-server
andsystem-server
. We do this through a new module inserver_utils
:server_utils.sql_utils
.Test Plan
The original bug is annoying to trigger artificially. I've reproduced it in another PR by:
ALTER TABLE
statementtime.sleep(120)
right after thatALTER TABLE
statementmake -C robot-server dev OT_ROBOT_SERVER_persistence_directory=tests/integration/persistence_snapshots/v6.2.0_large/
pkill make
in another terminalsqlite3
CLI and noticing that the new column was there, even though the migration was left incompleteI've confirmed that the bug doesn't happen with this fix applied.
This is our second SQLite fixup—the first is that we have to go out of our way to enable foreign key constraints. I think these have grown to the point where they need their own tests. I’m imagining we make some temporary databases with dummy schemas, apply these fixup functions to them, and test that when we access the tables, they behave the way we want. I've omitted those tests from this PR for expedience, but I'm happy to do them in a follow-up after some other high-priority stuff. RSS-331
Review requests
Are these functions documented sufficiently so it's clear how to use them and why, even if it's not clear how they work at a low level?
Risk assessment
There's low risk that this will break anything, but there is some risk that it won't work properly to fix the bug, given how non-obvious it is. See my note in the test plan about adding unit tests later.