Skip to content

Conditional Migrations

Vsevolod Romashov edited this page Mar 13, 2019 · 9 revisions

Conditional migrations allow you to customize the transition to the desired database structure or migrate some data.

While basic DbSchema usage comes down to describing the database schema your application relies upon and letting DbSchema do the work, there are cases when it cannot correctly infer the required operations:

  • rename table, column or enum type
  • change column type with a USING clause
  • migrate data in any way

You can't rename a table just by changing it's name in the schema definition because that would result in that table being deleted and a similar table with the new name created from scratch (effectively deleting all your data).

This is where conditional migrations come in. They look like a common ActiveRecord or Sequel migration describing the actions required to migrate the structure and data, but they also define the conditions under which the migration will get run.

db.migrate 'rename people to users' do |migration|
  migration.apply_if do |schema|
    # only run the migration if the `people` table still exists
    schema.has_table?(:people)
  end

  migration.skip_if do |schema|
    # don't run the migration if the `users.role` field already has :user_role type
    schema.table(:users).field(:role).type == :user_role
  end

  migration.run do |migrator|
    # rename the `people` table to `users`
    migrator.rename_table :people, to: :users

    # switch the `users.role` field from varchar to a dedicated enum type
    migrator.create_enum :user_role, [:user, :admin]
    migrator.alter_table :users do |t|
      t.alter_column_type :role, :user_role, using: 'role::user_role'
    end
  end
end

The apply_if block defines a positive condition while the skip_if block defines a negative one; so the migration will only run if all apply_if conditions return true and all skip_if conditions return false. These blocks yield a schema object which can be queried for data; it is discussed in Schema analysis DSL.

Then the run block defines the body of this migration - it yields a migrator object that provides an interface for all supported operations (create_table, add_index, drop_enum etc). All possible migrator operations are described in Migration DSL.

The migrations are run in the order they are defined, skipping those having unsatisfied conditions; then the (possibly) migrated database schema is compared to the definition and all remaining changes are applied. Keep in mind that you still need to change your schema definition accordingly, otherwise DbSchema will try to apply your old schema. Schema definition is the only source of truth about the database while migrations are just a way to customize the transition to a new structure.

Example

Suppose that you have the following users table:

DbSchema.describe do |db|
  db.table :users do |t|
    t.serial  :id,         primary_key: true
    t.varchar :first_name, null: false
    t.varchar :last_name,  null: false
    t.varchar :email,      null: false, unique: true
    t.varchar :password,   null: false
  end
end

And for some crazy reason you need to join first_name and last_name columns into one name column; it must be NOT NULL just like the existing columns but you can't add a NOT NULL column in an existing table with records - first you need to fill it with some data.

You could do it by adding a new column to your schema definition, deploying the app to production, then manually filling this column with data, then adding null: false to this new column in the schema definition and deploying again, but this sounds crazy.

Instead you can add a new field as NOT NULL right to your table definition and describe the schema transition in a conditional migration:

DbSchema.describe do |db|
  db.migrate 'Join users.first_name & users.last_name' do |migration|
    migration.apply_if do |schema|
      schema.table(:users).has_field?(:first_name) &&
        schema.table(:users).has_field?(:last_name)
    end

    migration.skip_if do |schema|
      schema.table(:users).has_field?(:name)
    end

    migration.run do |migrator|
      migrator.alter_table :users do |t|
        # add a nullable column
        t.add_column :name
      end

      # fill it with data
      migrator.execute "UPDATE users SET name = first_name || ' ' || last_name"

      migrator.alter_table :users do |t|
        # set the `name` column as NOT NULL
        t.disallow_null :name
        # and drop old columns
        t.drop_column :first_name
        t.drop_column :last_name
      end
    end
  end

  db.table :users do |t|
    t.serial  :id,         primary_key: true
    t.varchar :name,       null: false
    t.varchar :email,      null: false, unique: true
    t.varchar :password,   null: false
  end
end

And here's how DbSchema will process this:

  • iterate through all conditional migrations and for each migration
    • check if the conditions are satisfied (otherwise skip this migration)
    • run the migration body
    • re-read the database structure so that the next migration conditions check the actual schema
  • check if the schema differs from the declarative schema definition
  • and if it does, apply the necessary operations

So in our case DbSchema sees a migration whose conditions are satisfied (users table has first_name and last_name columns but doesn't have a name column) and runs it; then our users table looks exactly like we defined it at the top so the declarative mechanism does nothing. Next time DbSchema does it's work it will see that the migration conditions are not satisfied (because we have a users.name column) so it does nothing.

Actually we can drop the last alter_table :users from the migration body because DbSchema would still do the same. If the migration just adds a name column and fills it with data, then the declarative mechanism sees that there is a nullable name column that should be NOT NULL and also there are first_name/last_name columns we don't need; so it adds a NOT NULL constraint to name and drops old columns - exactly what we described in a migration.

We can also remove a condition that checks if we have these first_name & last_name columns - the whole operation will still be idempotent because once the migration has run for the first time it won't be applicable anymore. So finally we have this:

DbSchema.describe do |db|
  db.migrate 'Join users.first_name & users.last_name' do |migration|
    migration.skip_if do |schema|
      schema.table(:users).has_field?(:name)
    end

    migration.run do |migrator|
      migrator.alter_table :users { |t| t.add_column :name, :varchar }
      migrator.execute "UPDATE users SET name = first_name || ' ' || last_name"
    end
  end

  db.table :users do |t|
    t.serial  :id,         primary_key: true
    t.varchar :name,       null: false
    t.varchar :email,      null: false, unique: true
    t.varchar :password,   null: false
  end
end

The main point here is that first DbSchema runs all the applicable migrations (and they may affect one another so keep them in correct order), then the declarative mechanism compares the structure of the database after all migrations with the structure described with DbSchema DSL and eliminates the remaining changes between them.

You don't need to keep migrations forever though; their goal is to transform the database to a new state so once the migration was run in production you may keep it for a couple of weeks (so your coworkers have the time to keep up) and then drop it from the schema definition. Not only this cleans up the definition but also helps to avoid situations when the migration accidentally becomes applicable again (imagine that someone removed the name column; DbSchema will try to run the migration but it will crash).

Running arbitrary code in migrations

The most efficient way to migrate data is to use #execute method described here:

migration.run do |migrator|
  migrator.execute "UPDATE users SET status = 'confirmed' WHERE confirmed_at IS NOT NULL"
  migrator.execute "UPDATE users SET status = 'guest' WHERE confirmed_at IS NULL"
end

But sometimes you want to express your logic using Ruby rather than SQL - it's more readable and familiar. DbSchema yields a Sequel database object as a second argument to migration body so you can use Sequel dataset operations:

migration.run do |migrator, db|
  db[:users].where(external_id: nil).each do |user|
    external_id = ExternalService.find_user(user[:email]).user_id
    db[:users].where(id: user[:id]).update(external_id: external_id)
  end
end

Please don't use your ORM models in migrations. It is considered to be a bad practice and is really tricky to implement in an ORM agnostic library so DbSchema doesn't support that.

Clone this wiki locally