Meteor - Database - Migration

meteor
meteor-database

How can we write migration?

A useful package for writing migrations is percolate:migrations, which provides a nice framework for switching between different versions of your schema.

Suppose, as an example, that we wanted to add a list.todoCount field, and ensure that it was set for all existing lists. Then we might write the following in server-only code (e.g. /server/migrations.js):

Migrations.add({
  version: 1,
  up() {
    Lists.find({todoCount: {$exists: false}}).forEach(list => {
      const todoCount = Todos.find({listId: list._id}).count();
      Lists.update(list._id, {$set: {todoCount}});
    });
  },
  down() {
    Lists.update({}, {$unset: {todoCount: true}});
  }
});

This migration, which is sequenced to be the first migration to run over the database, will, when called, bring each list up to date with the current todo count.

To find out more about the API of the Migrations package, refer to its documentation.

What is this 'bulk change' stuff?

If your migration needs to change a lot of data, and especially if you need to stop your app server while it’s running, it may be a good idea to use a MongoDB Bulk Operation.

The advantage of a bulk operation is that it only requires a single round trip to MongoDB for the write, which usually means it is a lot faster. The downside is that if your migration is complex (which it usually is if you can’t just do an .update(.., .., {multi: true})), it can take a significant amount of time to prepare the bulk update.

What this means is if users are accessing the site whilst the update is being prepared, it will likely go out of date! Also, a bulk update will lock the entire collection while it is being applied, which can cause a significant blip in your user experience if it takes a while. For these reason, you often need to stop your server and let your users know you are performing maintenance while the update is happening.

We could write our above migration like so (note that you must be on MongoDB 2.6 or later for the bulk update operations to exist). We can access the native MongoDB API via Collection#rawCollection():

Migrations.add({
  version: 1,
  up() {
    // This is how to get access to the raw MongoDB node collection that the Meteor server collection wraps
    const batch = Lists.rawCollection().initializeUnorderedBulkOp();
    Lists.find({todoCount: {$exists: false}}).forEach(list => {
      const todoCount = Todos.find({listId: list._id}).count();
      // We have to use pure MongoDB syntax here, thus the `{_id: X}`
      batch.find({_id: list._id}).updateOne({$set: {todoCount}});
    });

    // We need to wrap the async function to get a synchronous API that migrations expects
    const execute = Meteor.wrapAsync(batch.execute, batch);
    return execute();
  },
  down() {
    Lists.update({}, {$unset: {todoCount: true}});
  }
});

Note that we could make this migration faster by using an Aggregation to gather the initial set of todo counts.

How can we run migration?

To run a migration against your development database, it’s easiest to use the Meteor shell:

// After running `meteor shell` on the command line:
Migrations.migrateTo('latest');

If the migration logs anything to the console, you’ll see it in the terminal window that is running the Meteor server. To run a migration against your production database, run your app locally in production mode (with production settings and environment variables, including database settings), and use the Meteor shell in the same way. What this does is run the up() function of all outstanding migrations, against your production database. In our case, it should ensure all lists have a todoCount field set.

A good way to do the above is to spin up a virtual machine close to your database that has Meteor installed and SSH access (a special EC2 instance that you start and stop for the purpose is a reasonable option), and running the command after shelling into it. That way any latencies between your machine and the database will be eliminated, but you still can be very careful about how the migration is run.

Note that you should always take a database backup before running any migration!

What is this 'breaking schema changes'?

Sometimes when we change the schema of an application, we do so in a breaking way – so that the old schema doesn’t work properly with the new code base. For instance, if we had some UI code that heavily relied on all lists having a todoCount set, there would be a period, before the migration runs, in which the UI of our app would be broken after we deployed.

The simple way to work around the problem is to take the application down for the period in between deployment and completing the migration. This is far from ideal, especially considering some migrations can take hours to run (although using Bulk Updates probably helps a lot here).

A better approach is a multi-stage deployment. The basic idea is that:

  1. Deploy a version of your application that can handle both the old and the new schema. In our case, it’d be code that doesn’t expect the todoCount to be there, but which correctly updates it when new todos are created.
  2. Run the migration. At this point you should be confident that all lists have a todoCount.
  3. Deploy the new code that relies on the new schema and no longer knows how to deal with the old schema. Now we are safe to rely on list.todoCount in our UI.

Another thing to be aware of, especially with such multi-stage deploys, is that being prepared to rollback is important! For this reason, the migrations package allows you to specify a down() function and call Migrations.migrateTo(x) to migrate back to version x.

So if we wanted to reverse our migration above, we’d run:

// The "0" migration is the unmigrated (before the first migration) state
Migrations.migrateTo(0);

If you find you need to roll your code version back, you’ll need to be careful about the data, and step carefully through your deployment steps in reverse.

What are some caveats regarding database migration?

Some aspects of the migration strategy outlined above are possibly not the most ideal way to do things (although perhaps appropriate in many situations). Here are some other things to be aware of:

  1. Usually it is better to not rely on your application code in migrations (because the application will change over time, and the migrations should not). For instance, having your migrations pass through your Collection2 collections (and thus check schemas, set autovalues etc) is likely to break them over time as your schemas change over time. One way to avoid this problem is simply to not run old migrations on your database. This is a little bit limiting but can be made to work.
  2. Running the migration on your local machine will probably make it take a lot longer as your machine isn’t as close to the production database as it could be.

Deploying a special “migration application” to the same hardware as your real application is probably the best way to solve the above issues. It’d be amazing if such an application kept track of which migrations ran when, with logs and provided a UI to examine and run them. Perhaps a boilerplate application to do so could be built (if you do so, please let us know and we’ll link to it here!).

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License