-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(key-value): use flush instead of commit #29286
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #29286 +/- ##
===========================================
+ Coverage 60.48% 83.73% +23.24%
===========================================
Files 1931 518 -1413
Lines 76236 37566 -38670
Branches 8568 0 -8568
===========================================
- Hits 46114 31456 -14658
+ Misses 28017 6110 -21907
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
97e5834
to
23f1d91
Compare
@@ -62,6 +63,7 @@ def run(self) -> str: | |||
codec=self.codec, | |||
).run() | |||
assert key.id # for type checks | |||
db.session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the UpsertKeyValueCommand
not commit? Note that hopefully (if merged) that #24969 should remove the need to have to commit in various places given it violates the "unit of work" philosophy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm proposing to not commit in these commands, as we have cases where we want to chain multiple commands together. But if we consider each Key Value command a unit of work, then we should naturally commit there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there's a case to be made for both. Maybe we can have default behavior be to commit but provide an option to skip for use cases where there's a command chaining multiple sub-commands? It would be inconsistent for some commands to commit while others requiring the caller to commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this has previously been considered an antipattern (=having commit: bool
flag in these types of methods or similar), I'm personally also kind of leaning in that direction. This would make it possible to use all existing commands both as full units of work, or as part of a bigger chain, with minimal code duplication. Thoughts @john-bodley ? Also pinging @michael-s-molina as you've worked on these components in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIP-99B will handle the chaining of commands (if necessary) via nested sessions where only the outermost session commits.
I think this logic is fine for now, though @villebro some of it maybe updated if/when I get my SIP-99B through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make it possible to use all existing commands both as full units of work, or as part of a bigger chain, with minimal code duplication
We'll achieve that with nested transactions using begin_nested. Check #24969 for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll achieve that with nested transactions using being_nested. Check #24969 for reference.
@michael-s-molina I read up on the docs I could find, and I agree, this should be an elegant solution to this dual use case.
@john-bodley it would be great if you could add a description to #24969 so we can start reviewing/testing it. I'm keen on getting this important refactor in, as it will have a profound impact on the general quality and perormance of the backend.
a6aaa61
to
1d12eff
Compare
SUMMARY
In the key value commands, we are often returning the keys from newly generated ORM objects, which are persisted in the
key_value
table. As we're currently usingcommit()
to persist the objects prior to reading the keys, the session is reset, requiring a new session for fetching the keys. This can cause issues when distributing metastore reads to read replicas, as the data may sometimes not be available, causing anObjectDeletedError
exception duringCreateKeyValueCommand.run()
:A simple fix is to use
flush()
, which will ensure the same session is used, as stated in SIP-99A:As we're now proposing to replace
commit()
withflush()
in the key value commands, the session will need to be explicitly committed once the full unit of work is done. For example, for the create/upsert commands, we will now only commit after the last task finishes, ensuring atomicity:However, there are instances where we explicitly need commits to happen during a single request lifecycle. For instance, the Key Value Distributed Lock needs to commit the lock to the metastore so other workers can observe it for the duration of the lock event. The same applies to the Metastore cache, where every set/add/delete operation should commit to the cache. For this reason we're adding explicit
commit()
calls both in the Metastore Cache and Lock context manager to ensure the values are persisted in the db during execution flow.ADDITIONAL INFORMATION