-
Notifications
You must be signed in to change notification settings - Fork 1
Conditional Migrations
Conditional migrations allow you to customize the transition to the desired database structure or migrate some data.
While basic DbSchema usage comes down to describing the database schema your application relies upon and letting DbSchema do the work, there are cases when it cannot correctly infer the required operations:
- rename table, column or enum type
- change column type with a
USING
clause - migrate data in any way
You can't rename a table just by changing it's name in the schema definition because that would result in that table being deleted and a similar table with the new name created from scratch (effectively deleting all your data).
This is where conditional migrations come in. They look like a common ActiveRecord or Sequel migration describing the actions required to migrate the structure and data, but they also define the conditions under which the migration will get run.
db.migrate 'rename people to users' do |migration|
migration.apply_if do |schema|
# only run the migration if the `people` table still exists
schema.has_table?(:people)
end
migration.skip_if do |schema|
# don't run the migration if the `users.role` field already has :user_role type
schema.table(:users).field(:role).type == :user_role
end
migration.run do |migrator|
# rename the `people` table to `users`
migrator.rename_table :people, to: :users
# switch the `users.role` field from varchar to a dedicated enum type
migrator.create_enum :user_role, [:user, :admin]
migrator.alter_table :users do |t|
t.alter_column_type :role, :user_role, using: 'role::user_role'
end
end
end
The apply_if
block defines a positive condition while the skip_if
block defines a negative one;
so the migration will only run if all apply_if
conditions return true
and all skip_if
conditions
return false
. These blocks yield a schema object which can be queried for data; it is discussed
in Schema analysis DSL.
Then the run
block defines the body of this migration - it yields a migrator object that provides
an interface for all supported operations (create_table
, add_index
, drop_enum
etc). All possible
migrator operations are described in Migration DSL.
The migrations are run in the order they are defined, skipping those having unsatisfied conditions; then the (possibly) migrated database schema is compared to the definition and all remaining changes are applied. Keep in mind that you still need to change your schema definition accordingly, otherwise DbSchema will try to apply your old schema. Schema definition is the only source of truth about the database while migrations are just a way to customize the transition to a new structure.
Suppose that you have the following users
table:
DbSchema.describe do |db|
db.table :users do |t|
t.serial :id, primary_key: true
t.varchar :first_name, null: false
t.varchar :last_name, null: false
t.varchar :email, null: false, unique: true
t.varchar :password, null: false
end
end
And for some crazy reason you need to join first_name
and last_name
columns into one name
column; it must be NOT NULL
just like the existing
columns but you can't add a NOT NULL
column in an existing table with records -
first you need to fill it with some data.
You could do it by adding a new column to your schema definition, deploying the app
to production, then manually filling this column with data, then adding null: false
to this new column in the schema definition and deploying again, but this sounds crazy.
Instead you can add a new field as NOT NULL
right to your table definition and
describe the schema transition in a conditional migration:
DbSchema.describe do |db|
db.migrate 'Join users.first_name & users.last_name' do |migration|
migration.apply_if do |schema|
schema.table(:users).has_field?(:first_name) &&
schema.table(:users).has_field?(:last_name)
end
migration.skip_if do |schema|
schema.table(:users).has_field?(:name)
end
migration.run do |migrator|
migrator.alter_table :users do |t|
# add a nullable column
t.add_column :name
end
# fill it with data
migrator.execute "UPDATE users SET name = first_name || ' ' || last_name"
migrator.alter_table :users do |t|
# set the `name` column as NOT NULL
t.disallow_null :name
# and drop old columns
t.drop_column :first_name
t.drop_column :last_name
end
end
end
db.table :users do |t|
t.serial :id, primary_key: true
t.varchar :name, null: false
t.varchar :email, null: false, unique: true
t.varchar :password, null: false
end
end
And here's how DbSchema will process this:
- iterate through all conditional migrations and for each migration
- check if the conditions are satisfied (otherwise skip this migration)
- run the migration body
- re-read the database structure so that the next migration conditions check the actual schema
- check if the schema differs from the declarative schema definition
- and if it does, apply the necessary operations
So in our case DbSchema sees a migration whose conditions are satisfied (users
table has
first_name
and last_name
columns but doesn't have a name
column) and runs it;
then our users
table looks exactly like we defined it at the top so the declarative
mechanism does nothing. Next time DbSchema does it's work it will see that the migration
conditions are not satisfied (because we have a users.name
column) so it does nothing.
Actually we can drop the last alter_table :users
from the migration body
because DbSchema would still do the same. If the migration just adds a name
column
and fills it with data, then the declarative mechanism sees that there is a nullable name
column that should be NOT NULL
and also there are first_name
/last_name
columns
we don't need; so it adds a NOT NULL
constraint to name
and drops old columns - exactly
what we described in a migration.
We can also remove a condition that checks if we have these first_name
& last_name
columns - the whole operation will still be idempotent because once the migration has
run for the first time it won't be applicable anymore. So finally we have this:
DbSchema.describe do |db|
db.migrate 'Join users.first_name & users.last_name' do |migration|
migration.skip_if do |schema|
schema.table(:users).has_field?(:name)
end
migration.run do |migrator|
migrator.alter_table :users { |t| t.add_column :name, :varchar }
migrator.execute "UPDATE users SET name = first_name || ' ' || last_name"
end
end
db.table :users do |t|
t.serial :id, primary_key: true
t.varchar :name, null: false
t.varchar :email, null: false, unique: true
t.varchar :password, null: false
end
end
The main point here is that first DbSchema runs all the applicable migrations (and they may affect one another so keep them in correct order), then the declarative mechanism compares the structure of the database after all migrations with the structure described with DbSchema DSL and eliminates the remaining changes between them.
You don't need to keep migrations forever though; their goal is to transform
the database to a new state so once the migration was run in production you may
keep it for a couple of weeks (so your coworkers have the time to keep up) and then
drop it from the schema definition. Not only this cleans up the definition but also
helps to avoid situations when the migration accidentally becomes applicable again
(imagine that someone removed the name
column; DbSchema will try to run the
migration but it will crash).
The most efficient way to migrate data is to use #execute
method described here:
migration.run do |migrator|
migrator.execute "UPDATE users SET status = 'confirmed' WHERE confirmed_at IS NOT NULL"
migrator.execute "UPDATE users SET status = 'guest' WHERE confirmed_at IS NULL"
end
But sometimes you want to express your logic using Ruby rather than SQL - it's more readable and familiar. DbSchema yields a Sequel database object as a second argument to migration body so you can use Sequel dataset operations:
migration.run do |migrator, db|
db[:users].where(external_id: nil).each do |user|
external_id = ExternalService.find_user(user[:email]).user_id
db[:users].where(id: user[:id]).update(external_id: external_id)
end
end
Please don't use your ORM models in migrations. It is considered to be a bad practice and is really tricky to implement in an ORM agnostic library so DbSchema doesn't support that.