fix(key-value): use flush instead of commit #29286

villebro · 2024-06-18T07:51:07Z

SUMMARY

In the key value commands, we are often returning the keys from newly generated ORM objects, which are persisted in the key_value table. As we're currently using commit() to persist the objects prior to reading the keys, the session is reset, requiring a new session for fetching the keys. This can cause issues when distributing metastore reads to read replicas, as the data may sometimes not be available, causing an ObjectDeletedError exception during CreateKeyValueCommand.run():

sqlalchemy.orm.exc.ObjectDeletedError: Instance '<KeyValueEntry at 0x7fe508293280>' has been deleted, or its row is otherwise not present.

A simple fix is to use flush(), which will ensure the same session is used, as stated in SIP-99A:

It is worth noting that objects within the session, in either pending* or persisted state, can be queried albeit not having been committed to the database. We have a tendency to over commit (be that in the code or tests) resulting in fractured/partial atomic units. Typically working with objects in a persisted state is sufficient.

As we're now proposing to replace commit() with flush() in the key value commands, the session will need to be explicitly committed once the full unit of work is done. For example, for the create/upsert commands, we will now only commit after the last task finishes, ensuring atomicity:

delete expired entries
insert new entry
commit

However, there are instances where we explicitly need commits to happen during a single request lifecycle. For instance, the Key Value Distributed Lock needs to commit the lock to the metastore so other workers can observe it for the duration of the lock event. The same applies to the Metastore cache, where every set/add/delete operation should commit to the cache. For this reason we're adding explicit commit() calls both in the Metastore Cache and Lock context manager to ensure the values are persisted in the db during execution flow.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2024-06-18T08:30:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.73%. Comparing base (76d897e) to head (01333c7).
Report is 1094 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #29286       +/-   ##
===========================================
+ Coverage   60.48%   83.73%   +23.24%     
===========================================
  Files        1931      518     -1413     
  Lines       76236    37566    -38670     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31456    -14658     
+ Misses      28017     6110    -21907     
+ Partials     2105        0     -2105

Flag	Coverage Δ
hive	`48.93% <20.00%> (-0.23%)`	⬇️
javascript	`?`
mysql	`77.23% <90.00%> (?)`
postgres	`77.34% <90.00%> (?)`
presto	`53.54% <20.00%> (-0.26%)`	⬇️
python	`83.73% <100.00%> (+20.24%)`	⬆️
sqlite	`76.79% <90.00%> (?)`
unit	`59.25% <50.00%> (+1.62%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

superset/utils/lock.py

john-bodley · 2024-06-18T17:23:41Z

superset/commands/dashboard/permalink/create.py

@@ -62,6 +63,7 @@ def run(self) -> str:
                codec=self.codec,
            ).run()
            assert key.id  # for type checks
+            db.session.commit()


Does the UpsertKeyValueCommand not commit? Note that hopefully (if merged) that #24969 should remove the need to have to commit in various places given it violates the "unit of work" philosophy.

Here I'm proposing to not commit in these commands, as we have cases where we want to chain multiple commands together. But if we consider each Key Value command a unit of work, then we should naturally commit there.

i think there's a case to be made for both. Maybe we can have default behavior be to commit but provide an option to skip for use cases where there's a command chaining multiple sub-commands? It would be inconsistent for some commands to commit while others requiring the caller to commit.

While this has previously been considered an antipattern (=having commit: bool flag in these types of methods or similar), I'm personally also kind of leaning in that direction. This would make it possible to use all existing commands both as full units of work, or as part of a bigger chain, with minimal code duplication. Thoughts @john-bodley ? Also pinging @michael-s-molina as you've worked on these components in the past.

SIP-99B will handle the chaining of commands (if necessary) via nested sessions where only the outermost session commits.

I think this logic is fine for now, though @villebro some of it maybe updated if/when I get my SIP-99B through.

This would make it possible to use all existing commands both as full units of work, or as part of a bigger chain, with minimal code duplication

We'll achieve that with nested transactions using begin_nested. Check #24969 for reference.

We'll achieve that with nested transactions using being_nested. Check #24969 for reference.

@michael-s-molina I read up on the docs I could find, and I agree, this should be an elegant solution to this dual use case.

@john-bodley it would be great if you could add a description to #24969 so we can start reviewing/testing it. I'm keen on getting this important refactor in, as it will have a profound impact on the general quality and perormance of the backend.

villebro added 3 commits June 18, 2024 09:30

fix(key-value): use flush instead of commit

ff87b86

fix lock tests

3f6eb8d

refactor lock tests

3271579

pull-request-size bot added the size/L label Jun 18, 2024

use separate session for lock tests

966e2e4

villebro added 3 commits June 18, 2024 11:34

lint

8ff97c7

refactor lock test

d1a88de

clean up commits from key value tests

e409f69

villebro requested review from betodealmeida and john-bodley June 18, 2024 10:05

refactor expiry logic

23f1d91

villebro force-pushed the villebro/key-value-flush branch from 97e5834 to 23f1d91 Compare June 18, 2024 12:29

add explicit flushes/commits

3db50ab

villebro requested a review from nytai June 18, 2024 13:05

villebro commented Jun 18, 2024

View reviewed changes

superset/utils/lock.py Outdated Show resolved Hide resolved

john-bodley reviewed Jun 18, 2024

View reviewed changes

villebro added 3 commits June 19, 2024 10:08

add description to test

9c697a6

remove metastore changes

a17c661

Merge branch 'master' into villebro/key-value-flush

c6ce18a

pull-request-size bot added size/M and removed size/L labels Jun 19, 2024

remove lock changes

1d12eff

villebro force-pushed the villebro/key-value-flush branch from a6aaa61 to 1d12eff Compare June 19, 2024 08:00

villebro added 2 commits June 19, 2024 12:19

fix metastore

30fff7c

fix test

8b2a38f

john-bodley approved these changes Jun 19, 2024

View reviewed changes

villebro added 2 commits June 20, 2024 15:56

Merge branch 'master' into villebro/key-value-flush

b3d96f9

Merge branch 'master' into villebro/key-value-flush

01333c7

villebro merged commit 1770f8b into apache:master Jun 20, 2024
37 checks passed

villebro deleted the villebro/key-value-flush branch June 20, 2024 13:19

eschutho pushed a commit that referenced this pull request Jul 24, 2024

fix(key-value): use flush instead of commit (#29286)

346ae6c

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.1.0 labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(key-value): use flush instead of commit #29286

fix(key-value): use flush instead of commit #29286

villebro commented Jun 18, 2024 •

edited

Loading

codecov bot commented Jun 18, 2024 •

edited

Loading

john-bodley Jun 18, 2024

villebro Jun 18, 2024

nytai Jun 18, 2024

villebro Jun 18, 2024

john-bodley Jun 19, 2024

michael-s-molina Jun 20, 2024 •

edited

Loading

villebro Jun 20, 2024

fix(key-value): use flush instead of commit #29286

fix(key-value): use flush instead of commit #29286

Conversation

villebro commented Jun 18, 2024 • edited Loading

SUMMARY

ADDITIONAL INFORMATION

codecov bot commented Jun 18, 2024 • edited Loading

Codecov Report

john-bodley Jun 18, 2024

Choose a reason for hiding this comment

villebro Jun 18, 2024

Choose a reason for hiding this comment

nytai Jun 18, 2024

Choose a reason for hiding this comment

villebro Jun 18, 2024

Choose a reason for hiding this comment

john-bodley Jun 19, 2024

Choose a reason for hiding this comment

michael-s-molina Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

villebro Jun 20, 2024

Choose a reason for hiding this comment

villebro commented Jun 18, 2024 •

edited

Loading

codecov bot commented Jun 18, 2024 •

edited

Loading

michael-s-molina Jun 20, 2024 •

edited

Loading