Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Order Migrations and Possible Solutions #697

Closed
tracehelms opened this issue Sep 14, 2023 · 13 comments · Fixed by #723
Closed

Out of Order Migrations and Possible Solutions #697

tracehelms opened this issue Sep 14, 2023 · 13 comments · Fixed by #723
Labels
enhancement New feature or request migrations Related to migrations

Comments

@tracehelms
Copy link
Contributor

Hi Kysely team 👋

First of all, I want to thank you for the awesome library. I've never seen such thorough type completion in a database library like this—it's incredible.

I'm using Kysely on a project at work and we're having an issue with migrations. Specifically, running migrations "out of order" and the ensureMigrationsNotCorrupted check. There was some discussion in #472 about this, but I wanted to open a new issue and brainstorm some possible solutions.

The Problem

We have multiple developers on our team. If two people create migrations asynchronously on separate branches, either those migrations have to be merged in the right order or we have to rebase and manually rename the "out of order" files before merging.

Example:

  • Developer A creates migration 2023-09-13-120000-migration-A.ts
  • Developer B creates migration 2023-09-14-120000-migration-B.ts
  • Both developers make a PR at roughly the same time
  • Developer B gets their PR approved, merged, and deployed first
  • Developer A then get their PR approved and merged
  • Developer A deploys to production and tries to run migrations

Currently, ensureMigrationsNotCorrupted would throw an error. But this is an extremely common scenario when developing on a team.

As a work-around right now, I'm trying to create a GitHub Action that will detect new migration files when a PR is merged, update the timestamp in the filename, and commit the changes. But it's complicated, has to be set up for each new project, and the author of any migrations has to roll back their migrations on their dev branch before checking out the main branch again.

We could also manually rebase and rename the files before merging, but that's tedius and prone to being forgotten.

Possible Solutions

Instead of work-arounds, I would prefer to help fix this for all Kysely users. I imagine this will come up over and over again for teams trying to adopt Kysely.

Here are a couple of ideas I have. I wanted to see if you liked either of them before trying to put a PR together.

  1. Give users a configurable option for verifying migration order: verifyMigrationOrder: 'strict' and verifyMigrationOrder: 'permissive'. This seems like the easiest method from a user's perspective. Strict by default, permissive when you need it.

  2. Allow users to pass in a MigrationNotCorruptedProvider, similar to how you have FileMigrationProvider. This seems more prone to user error but would be more customizable.

Anyways, sorry for the long-winded explanation, I just wanted to make sure I laid everything out clearly. This project is awesome and I'd love to help make it easier for people to adopt and use. Thanks!

@koskimas
Copy link
Member

koskimas commented Sep 15, 2023

I've always used a rebase flow where each developer is responsible for rebasing his/her branch on top of current main before merging. During this process they rename the migration and more importantly, make sure it still works in its new place. I don't understand a workflow where branches are blindly merged. There's no way to ensure the migrations work in that case.

Let's say you merge two migrations in parallel branches. One renames column a to b and the second renames the same column a to c. Things break.

@koskimas
Copy link
Member

But adding a verifyMigrationOrder option wouldn't hurt. We could add that.

@igalklebanov
Copy link
Member

igalklebanov commented Sep 15, 2023

I've always used a rebase flow where each developer is responsible for rebasing his/her branch on top of current main before merging. During this process they rename the migration and more importantly, make sure it still works in its new place. I don't understand a workflow where branches are blindly merged. There's no way to ensure the migrations work in that case.

Let's say you merge two migrations in parallel branches. One renames column a to b and the second renames the same column a to c. Things break.

I've seen a more automated approach with merge queues (e.g. bors, kodiak, or GH's new feature) and not requiring a branch to be synced with main as a way to speed things up (given that the changes are e2e'd).

Thing is, migrations were always manual. Someone "takes" the team to production, and if there are migrations to run, the person handles them before deploying the other changes on main.

I agree that teams running migrations automatically is quite dangerous and not worth the "velocity gain".

@tday
Copy link

tday commented Sep 15, 2023

I agree with the points that merging out of order can be dangerous. The default should be to require ordered migrations.

+1 to having an override to run out of order migrations built-in. It would enable many people to have a tool to quickly resolve conflicts out of the box AFTER manual verification that the conflict is safe to commit (.e.g, changes to two different, unrelated tables). Right now, users need to build their own tooling or manually do it which is avoidable toil.

If some precedence would help, flyway provides this type of flag

@tracehelms
Copy link
Contributor Author

Thanks for the responses everyone. I fully agree in-order should be the default and understand it's value there.

Most of my experience is with Elixir's Phoenix and Ruby on Rails. They both allow out of order migrations and reference the schema_migrations table to see which migrations have and haven't been run. You still rebase and make sure things work, but you don't have to rename your migration files if they don't conflict.

And you're right, it's totally possible to have two migrations out of order that step on each other by modifying the same tables. But in my experience, that's far less common than two developers working on separate, unrelated migrations. With permissive ordering, the onus is on the developer and team to be aware of any migrations that might conflict with one another.

So if you're cool with it, I'll try putting together a PR that adds a configurable option for permissive ordering. I'm thinking about adding a migrationOrder: 'strict' | 'permissive' option to MigratorProps and then using that in the getState function.

That sound OK?

@koskimas
Copy link
Member

koskimas commented Sep 15, 2023

PR would be great! Your plan sounds good.

@daniel-nelson
Copy link

Let's say you merge two migrations in parallel branches. One renames column a to b and the second renames the same column a to c. Things break.

To clarify the extent of the risk one is taking on by using the permissive flag: the renaming example could never happen (assuming sound Github + CI settings) because the migration that would happen in CI would always break, independent of order (since each migration would try to rename from the same starting point, whichever ran second would fail). The only type of migration that could make it through CI and result in different results based on run order would be migrations that change the same thing in a way that leaves it in a state where a similar migration can be run, in any order, without throwing an error. Precision changes come to mind. And if there were two changes made to the precision of a particular field, test coverage of the change (e.g. validating that a particular precision is supported and beyond that is not) in both branches would surface the conflicting end-state, regardless of the order in which they were merged.

The point isn't to blindly merge, but to enable automation rather than rely on human performance of a manual task to prevent deployments from failing.

@tracehelms
Copy link
Contributor Author

Hey @koskimas,

After looking through the code in migrator.ts more closely, I think I'll need to make some more structural changes to support this.

Currently, Migrator works by finding the index of the migration you want to go up / down to and then iterating through the list of migration files. This works fine if the migrations are all already run and the orders exactly match. But with permissive ordering, it breaks down.

Given:

Migrations Already Ran:  0  1  2     4
Migration Files:         0  1  2  3  4  5

In this case, Migrator would determine the index of the latest migration is 4 and start iterating through Migration Files, only migrating file 5. Migrating down would break entirely.

Instead, for up migrations, we can calculate a list of pending migrations, which is all migrations that haven't been run. Then, when we say to migrate up N times, it migrates N of those pending migrations. This would work the same as it does now for strict ordering and would allow permissive ordering too.

For down migrations, we can determine what to roll back by always deferring to what the database has already run. In the above example, rolling back N=1 times would put us back at migration 2. Again, this would work the same as it does now for strict ordering but also allow permissive ordering.

As a bonus, I think this can make the code simpler too. Instead of finding indexes and loops, we can pass a list of ordered pending migrations or already_run migrations in reverse order, and traverse N items from them.

I just wanted to get your thoughts on this before I got too far into implementing it. Thanks!

@koskimas
Copy link
Member

koskimas commented Sep 19, 2023

@tracehelms Sounds like a plan 👍

@koskimas
Copy link
Member

koskimas commented Sep 19, 2023

@daniel-nelson You're assuming everyhing is perfectly tested. In case of my example, yes, any kind of CI that just runs the migrations would catch it. But there are many ways to create migrations that technically work, but the result is somehow broken. Then it's up to how good your automated tests are. In an average project they are nonexistent or really basic.

@igalklebanov igalklebanov added enhancement New feature or request migrations Related to migrations labels Sep 22, 2023
@tracehelms
Copy link
Contributor Author

Added a PR, let me know what you think 🎉

@sgarciac
Copy link

sgarciac commented Nov 8, 2023

Just in case it might be useful for someone, I'm using the following github action check to prevent Pull Requests from being merged unless strict order of new migrations is respected:

name: fast-forward-migrations-check
run-name: Fast forward Migrations Check
on: [pull_request]
jobs:
  check-incremental:
    name: fast-forward-migrations-check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          path: "head"
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.base.ref }}
          path: "base"
      - run: echo ${{ github.event.pull_request.base.ref }}
      - run: |
          {
            ls head/service/src/db/migrations > head-migrations
            ls base/service/src/db/migrations > base-migrations
            oldest_new_migration_in_head=$(sort base-migrations base-migrations head-migrations | uniq -u | sort | head -1)
            [ -z "$oldest_new_migration_in_head" ] && echo "No new migrations found in head branch." && exit 0
            most_recent_migration_in_base=$(tail -1 base-migrations)
            if [[ $oldest_new_migration_in_head < $most_recent_migration_in_base ]]; then
              echo "There is at least one migration in the head branch that is older than the latest migration known in base branch (${oldest_new_migration_in_head} < ${most_recent_migration_in_base})."
              exit 1
            else
              echo "No conflicting migrations found."
              exit 0
            fi
          }

@seivan
Copy link

seivan commented May 9, 2024

Sorry for posting in an older issue but...

Honestly, allowUnorderedMigrations should be default on since the migrations are stored.

This makes it both safe and flexible for the use cases described here. Not having allowUnorderedMigrations is less safe and more error prone. The order in which a migration has been merged is irrelevant as a the new stuff wouldn't refer to un-merged things anyway.

Ecto also locks the schema_migrations table when running migrations, guaranteeing two different servers cannot run the same migration at the same time, but maybe this is already implemented ?

And you're right, it's totally possible to have two migrations out of order that step on each other by modifying the same tables.

This can be solved by runtime checking the migrations filename conventions or storing meta-data for the migrations.

If migrations alters a table and it's timestamp is "older" than an existing migration that has altered same table, then it should automatically fail.

Doesn't matter if that's with meta-data with a migration or filename conventions, as long as it's checked when running the migrations.

Developer A creates migration 2023-09-13-120000-alter-table-A.ts
Developer B creates migration 2023-09-14-120000-alter-table-A.ts

Assume Developer B got their migrations merged.
Once you run Developer A migration it should check that no migration has altered table-A with a newer date.

If the filenames are strict, the check can happen there.
Otherwise storing table modifications with the timestamp can be used for checking that you aren't modifying a table that's been timestamped after you.

E.g Alter-A timestamped 2023-09-14 should not run any alter table migrations with anything before it, so 2023-09-13` should fail until it gets reviewed.

Another approach, and a bit more complicated is that migrations refer to schema type. So when Developer A merges and runs their migration they get a type error when trying to remove a column that doesn't exist because Developer B removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migrations Related to migrations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants