Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-99A] Primer on managing SQLAlchemy sessions #25107

Closed
john-bodley opened this issue Aug 28, 2023 · 3 comments
Closed

[SIP-99A] Primer on managing SQLAlchemy sessions #25107

john-bodley opened this issue Aug 28, 2023 · 3 comments
Assignees
Labels
sip Superset Improvement Proposal

Comments

@john-bodley
Copy link
Member

john-bodley commented Aug 28, 2023

[SIP-99A] Primer on managing SQLAlchemy sessions

This SIP is part of the [SIP-99] Proposal for correctly handling business logic series. Specifically, it serves as a primer—reiterating best practices—for working with SQLAlchemy sessions as these were typically mismanaged. The topics outlined here are highly coupled with [SIP-99B] Proposal for (re)defining a "unit of work" as they both relate to managing database transactions.

Session Management

The SQLAlchemy session—which establishes all conversations with the database—is managed by Flask-SQLAlchemy which also handles cleaning up connections and sessions after each request.

The preconfigured scoped session (called db.session) should be used for the lifetime of a request to ensure there is only one “virtual” transaction—guaranteeing that the “unit of work” is indeed atomic (SIP-99B).

The SQLAlchemy Managing Transactions documentation states:

The Session tracks the state of a single “virtual" transaction at a time. This “virtual" transaction is created automatically when needed, or can alternatively be started using the Session.begin() method. Session.commit() ends the transaction.

The transaction is typically framed as follows,

try:
    db.session.add(Dashboard(slug="foo"))
    db.session.commit()
except SQLAlchemyError:
    db.session.rollback()
    raise

though, via leveraging a context manager, the Session.begin() method serves as a demarcated transaction and provides the same sequence of operations:

with db.session.begin():
    db.session.add(Dashboard(slug="foo"))

Session States

There are five states which an object can have within a session:

  • Transient
  • Pending
  • Persisted
  • Deleted
  • Detached

The session tracks objects that are either new, dirty, or deleted.

The Session.flush() method communicates a series of operations to the database (INSERT, UPDATE, DELETE). The database maintains them as pending operations in a transaction. The changes are not persisted permanently to disk, or visible to other transactions until the database receives a COMMIT statement via the Session.commit() method—thus ending the transaction. If Session.commit() is called, then Session.flush() is called automatically.

The Session.add() method adds a object into the session whereas the Session.merge() convenience method copies the state of a given instance into a corresponding instance within the session.

Session.merge() examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on the primary key, and if none can be located, creates a new instance. The state of each attribute on the source instance is then copied to the target instance.

The following examples illustrate how Session.merge() works:

db.session.merge(Dashboard(slug="foo"))
db.session.flush()  # INSERT INTO dashboards ...
db.session.merge(Dashboard(id=1, slug="foo"))  # SELECT * FROM dashboards WHERE id = ?
db.session.flush()  # UPDATE dashboards SET ...

Note the use of the Session.merge() method is often misinterpreted, i.e., a persistent object does not need to be (re)merged. If the object is dirty the pending changes will be persisted during a subsequent flush.

Transient

An object that is not in the session.

>>> dashboard = Dashboard(slug="foo")
>>> inspect(dashboard).transient
True
>>> db.session.new
IdentitySet([])
>>> db.session.add(dashboard)
>>> db.session.new
IdentitySet([Dashboard<foo>])
>>> inspect(dashboard).transient
False
>>> db.session.remove()

Pending

An object which has been added to the session but has not yet been flushed to the database.

>>> dashboard = Dashboard(slug="foo")
>>> db.session.add(dashboard)
>>> inspect(dashboard).pending
True
>>> db.session.new
IdentitySet([Dashboard<foo>])
>>> db.session.flush()  # INSERT INTO dashboards ...
>>> db.session.new
IdentitySet([])
>>> inspect(dashboard).pending
False
>>> db.session.remove()

Note that constraints will only be checked when the object is flushed.

>>> db.session.add_all([Dashboard(slug="bar"), Dashboard(slug="bar")])
>>> db.session.flush()  # INSERT INTO dashboards ...
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: dashboards.slug
>>> db.session.remove()

Persistent

An object which is present in the session and has a corresponding record in the database. The persisted state occurs either during flushing (which is invoked by Session.merge()) or querying.

>>> dashboard = Dashboard(slug="foo")
>>> db.session.add(dashboard)
>>> db.session.flush()  # INSERT INTO dashboards ...
>>> inspect(dashboard).persistent
True
>>> db.session.dirty
IdentitySet([])
>>> dashboard.slug = "bar"  
>>> db.session.dirty
IdentitySet([Dashboard<1>])
>>> db.session.flush()  # UPDATE dashboards SET ...
>>> db.session.dirty
IdentitySet([])
>>> db.session.remove()

It is worth noting that objects within the session, in either pending* or persisted state, can be queried albeit not having been committed to the database. We have a tendency to over commit (be that in the code or tests) resulting in fractured/partial atomic units. Typically working with objects in a persisted state is sufficient.

>>> dashboard = Dashboard(slug="foo")
>>> db.session.add(dashboard)
>>> inspect(dashboard).pending
True
>>> dashboard is db.session.query(Dashboard).filter(Dashboard.slug == "foo").one()
>>> True
>>> inspect(dashboard).persistent
True
>>> db.session.remove()

* The Session.query() method invoked a Session.flush() which meant that the pending object became persisted.

Deleted

An object which has been deleted within a flush, but the transaction has not yet completed.

>>> dashboard = Dashboard(slug="foo")
>>> db.session.add(dashboard)
>>> db.session.flush()  # INSERT INTO dashboards ...
>>> inspect(dashboard).deleted
False
>>> db.session.deleted
IdentitySet([])
>>> db.session.delete(dashboard)
>>> db.session.deleted
IdentitySet([Dashboard<foo>])
>>> db.session.flush()  # DELETE FROM dashboards WHERE id = ?
>>> inspect(dashboard).deleted
True
>>> db.session.remove()

Detached

An object that corresponds (or previously corresponded) to a record in the database, but is not currently in any session.

>>> dashboard = Dashboard(slug="foo")
>>> db.session.add(dashboard)
>>> db.session.flush()  # INSERT INTO dashboards ...
>>> inspect(dashboard).detached
>>> False
>>> db.session.remove()
>>> inspect(dashboard).detached
True

Examples

The following examples illustrate where the session was mismanaged:

  • The TableSchemaView.expanded() method unnecessarily commits even though the function only fetches, i.e., there are no new, dirty, or deleted objects in the session.
  • The UpdateKeyValueCommand.update() method unnecessarily merges the dirty event object even though it was already persisted—per the Session.query() call—in the session.
  • The test_get_expired_entry test unnecessarily commits the entry object to the database given that the GetKeyValueCommand.run() method leverages the same Flask-SQLAlchemy session. This requires additional work to then delete the object from the database to ensure that the test remains idempotent.

Proposed Change

None beyond adhering to best practices when managing SQLAlchemy sessions—via the Flask-SQLAlchemy singleton scoped session.

New or Changed Public Interfaces

None.

New Dependencies

None.

Migration Plan and Compatibility

  • A wiki page will be created to specify the rules which should be adhered to.
  • A series of PRs to address existing rule violations.
  • New PRs (where possible) should adhere to the rules.

None.

Rejected Alternatives

None.

@john-bodley john-bodley added the sip Superset Improvement Proposal label Aug 28, 2023
@john-bodley john-bodley changed the title [SIP-98A] Primer on managing SQLAlchemy sessions [SIP-99A] Primer on managing SQLAlchemy sessions Aug 29, 2023
@eschutho
Copy link
Member

eschutho commented Oct 6, 2023

@john-bodley this is a really great summary. Thank you for writing it up, and I'm sure it will help a lot of people to better understand best practices in this space that is often quite murky.

@rusackas
Copy link
Member

rusackas commented Dec 6, 2023

Approved!

@rusackas
Copy link
Member

Anyone know what's left to do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

4 participants