Skip to content

Commit

Permalink
Timeout migrations that take too long to run (#11704)
Browse files Browse the repository at this point in the history
  • Loading branch information
dstufft authored Jun 28, 2022
1 parent 6d39d8d commit 39daea1
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 1 deletion.
30 changes: 30 additions & 0 deletions docs/development/database-migrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,33 @@ them in over time (for example, to rename a column you must add the column in
one migration + start writing to that column/reading from both, then you must
make a migration that backfills all of the data, then switch the code to stop
using the old column all together, then finally you can remove the old column).

To help protect against an accidentally long running migration from taking down
PyPI, by default a migration will timeout if it is waiting more than 4s to
acquire a lock, or if any individual statement takes more than 5s.

The lock timeout helps to protect against the case where a long running app
query is blocking the migration, and then the migration itself ends up
blocking short running app queries that would otherwise have been able to
run concurrently with the long running app query.

The statement timeout helps to protect against locking the database for an
extended period of time (often for data migrations).

It is possible to override these values inside of a migration, to do so you can
add:

.. code-block:: python
op.execute("SET statement_timeout = 5000")
op.execute("SET lock_timeout = 4000")
To your migration.


For more information on what kind of operations are safe in a high availability
environment like PyPI, there is related reading available at:

- `PostgreSQL at Scale: Database Schema Changes Without Downtime <https://medium.com/paypal-tech/postgresql-at-scale-database-schema-changes-without-downtime-20d3749ed680>`_
- `Move fast and migrate things: how we automated migrations in Postgres <https://benchling.engineering/move-fast-and-migrate-things-how-we-automated-migrations-in-postgres-d60aba0fc3d4>`_
- `PgHaMigrations <https://github.com/braintree/pg_ha_migrations>`_
3 changes: 3 additions & 0 deletions warehouse/migrations/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ def run_migrations_online():
connectable = create_engine(url, poolclass=pool.NullPool)

with connectable.connect() as connection:
connection.execute("SET statement_timeout = 5000")
connection.execute("SET lock_timeout = 4000")

context.configure(
connection=connection,
target_metadata=db.metadata,
Expand Down
17 changes: 16 additions & 1 deletion warehouse/migrations/script.py.mako
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ Revises: ${down_revision}
Create Date: ${create_date}
"""

from alembic import op
import sqlalchemy as sa

from alembic import op
${imports if imports else ""}

revision = ${repr(up_revision)}
Expand All @@ -32,6 +33,20 @@ down_revision = ${repr(down_revision)}
# up and running. Thus backwards incompatible changes must be broken up
# over multiple migrations inside of multiple pull requests in order to
# phase them in over multiple deploys.
#
# By default, migrations cannot wait more than 4s on acquiring a lock
# and each individual statement cannot take more than 5s. This helps
# prevent situations where a slow migration takes the entire site down.
#
# If you need to increase this timeout for a migration, you can do so
# by adding:
#
# op.execute("SET statement_timeout = 5000")
# op.execute("SET lock_timeout = 4000")
#
# To whatever values are reasonable for this migration as part of your
# migration.


def upgrade():
${upgrades if upgrades else "pass"}
Expand Down

0 comments on commit 39daea1

Please sign in to comment.