Every time when you deploy code with schema changes, you have to apply new Active Record migrations by running bin/rails db:migrate
. This is a common step of deploy scripts (see Capistrano).
While running migrations as a part of the deploy is the default approach used by most of companies, for some reason Rails community never reconsidered alternatives. Does it bring extra complexity to the release process?
- If a migration fails, should it fail and revert the deploy?
- If you want to revert, new code would run in production for a limited time before the migration failed. That could cause even more issues when you roll back.
- If you use more that one database (maybe you use sharding), you have to apply the migration to each database
- If a migration takes longer (hours), it blocks the deploys from finishing
- What if the actor who runs the migration lost SSH connection?
- In cloud environments (Heroku, Kubernetes), there may be no "after deploy" hook to run the migrations
This post describes how we can shave off the migrations part from the deploy process. What we came to at Shopify is asynchronous migrations that are eventually applied after a deploy and controlled by humans.
How does that work?
First we need to understand what db:migrate
really does.
If we look at Active Record Rake tasks, we’ll find a call to ActiveRecord::Base.connection.migration_context.migrate
. That has to be the entry point to run migrations. When it’s invoked with no arguments (like ENV['VERSION']
), MigrationContext#migrate
creates a MigrationProxy
instance for each migration class and calls Migrator.new.migrate
.
Now we understand how migrations are invoked, and we can try to redesign the process to make it asynchronous and stop running migrations as a part deploy. What if instead we'd run the migration from a background job?
Each time there is a pending migration, we would push a background job that would apply the actual migration and report the result. Let’s see how this could be implemented.
First, we need to schedule a recurring job (with a tool like sidekiq-cron) that would run every few minutes and check for pending migrations.
class MigrationAutoCannonJob < ApplicationJob
def perform
return unless migration_context.needs_migration?
pending_migrations = (migration_context.migrations.collect(&:version) - migration_context.get_all_versions)
# run them!
end
private
def migration_context
ActiveRecord::Base.connection.migration_context
end
end
We must remember than running a migration is a blocking process - we can’t run the next migration before the previous one finished. We also want to be able to monitor the state of running migrations, so let’s create an ActiveRecord model to keep track of it.
$ rails generate model async_migration version:integer state:text
# app/models/async_migration.rb
class AsyncMigration < ApplicationRecord
end
# don't forget to add unique indexes!
Now let’s update our recurring “auto cannon” job to keep track of things, and only run one migration at the time:
class MigrationAutoCannonJob < ApplicationJob
def perform
return unless migration_context.needs_migration?
if AsyncMigration.where(state: "processing").none?
AsyncMigration.create!(version: pending_migrations.first, state: "processing")
end
end
def pending_migrations
(migration_context.migrations.collect(&:version) - migration_context.get_all_versions)
end
# rest of the job
Now the job would create an entry in the async_migrations
table but only when there are no other entries in "processing" state. That protects us from running more than one migration at the same time. Keep in mind that the job is not protected from races, but that's OK because there will be only one instance of it scheduled.
Now let's create a callback for the model to actually process the migration:
class AsyncMigration < ApplicationRecord
after_commit :enqueue_processing_job, on: :create
private
def enqueue_processing_job
MigrationProcessingJob.perform_later(async_migration_id: id)
end
end
Each time AsyncMigration
is created, it will enqueue MigrationProcessingJob
that will run the actual migration. Let's see how that job may look like:
class MigrationProcessingJob < ApplicationJob
def perform(params)
async_migration = AsyncMigration.find(params.fetch(:async_migration_id))
all_migrations = migration_context.migrations
migration = all_migrations.find { |m| m.version == async_migration.version }
# actual work!
ActiveRecord::Migrator.new(:up, [migration]).migrate
async_migration.update!(state: "finished")
end
def migration_context
ActiveRecord::Base.connection.migration_context
end
end
There's quite a few things missing here, but you should get the idea by now: using a combination of two jobs and a database record, we can schedule migrations to run in background one by one.
Keep in mind that the code examples are very WIP. If you want to go further, you'd need to take care of these things:
- There's no error handling. We might want to update a status of
AsyncMigration
when migration fails with an error - There's no max retries defined for the job. Do you even want to retry migrations?
- You might want to measure and persist how much time the migration took
The possibilities are endless. You could even build an admin UI to run and monitor migrations, or send a message to a Slack channel when migrations complete or fail.
At Shopify we have hundreds of database shards, and on every schema change we have to run the migration on each of them. Release process would be way more fragile if those migrations were the part of deploy script. Instead, we use asynchronous migrations that would are eventually applied after each release. That's one of the key features that allow us to release more than 50 times per day.
We even post status of migrations to a Slack channel.
If working on such things sounds exciting for you, come join my team at Shopify.