Meteor - Database schema design, validation, and migration

meteor
meteor-database

https://atmospherejs.com/aldeed/simple-schema
https://github.com/aldeed/meteor-simple-schema
https://github.com/aldeed/meteor-collection2

What can we use a schema for?

Not only that we can use it for validation, we can also use it for generating form to capture the data for the collection. For this purpose, we have the following schema attributes:

  1. label
  2. max
  3. min
  4. defaultData
  5. optional
  6. autoform
  7. allowedValues
  8. type: String, Number, Boolean, Object, or use any constructor you like (such as Date or something you've created) http://www.webtempest.com/meteor-js-autoform-tutorial

Why should we define a schema?

Although MongoDB is a schema-less database, which allows maximum flexibility in data structuring, it is generally good practice to use a schema to constrain the contents of your collection to conform to a known format. If you don’t, then you tend to end up needing to write defensive code to check and confirm the structure of your data as it comes out of the database, instead of when it goes into the database. As in most things, you tend to read data more often than you write it, and so it’s usually easier, and less buggy to use a schema when writing.

In Meteor, the pre-eminent schema package is aldeed:simple-schema. It’s an expressive, MongoDB based schema that’s used to insert and update documents. Another alternative is jagi:astronomy which is a full Object Model (OM) layer offering schema definition, server/client side validators, object methods and event handlers.

How can we create a schema object using simple-schema?

To create a schema object using simple-schema, you can simply create a new instance of the SimpleSchema class:

Lists.schema = new SimpleSchema({
  name: {type: String},
  incompleteCount: {type: Number, defaultValue: 0},
  userId: {type: String, regEx: SimpleSchema.RegEx.Id, optional: true}
});
  1. We specify that the name field of a list is required and must be a string.
  2. We specify the incompleteCount is a number, which on insertion is set to 0 if not otherwise specified.
  3. We specify that the userId, which is optional, must be a string that looks like the ID of a user document.

We attach the schema to the namespace of Lists directly, which allows us to check objects against this schema directly whenever we want, such as in a form or Method. In the next section we’ll see how to use this schema automatically when writing to the collection.

You can see that with relatively little code we’ve managed to restrict the format of a list significantly. You can read more about more complex things that can be done with schemas in the Simple Schema docs.

How can we validate a document against a schema?

It’s pretty straightforward to validate a document with a schema. We can write:

const list = {
  name: 'My list',
  incompleteCount: 3
};

Lists.schema.validate(list);

In this case, as the list is valid according to the schema, the validate() line will run without problems. If however, we wrote:

const list = {
  name: 'My list',
  incompleteCount: 3,
  madeUpField: 'this should not be here'
};

Lists.schema.validate(list);

Then the validate() call will throw a ValidationError which contains details about what is wrong with the list document.

Although there are a variety of ways that you can run data through a Simple Schema before sending it to your collection (for instance you could check a schema in every method call), the simplest and most reliable is to use the aldeed:collection2 package to run every mutator (insert/update/upsert call) through the schema. To do so, we use attachSchema():

Lists.attachSchema(Lists.schema);

What this means is that now every time we call Lists.insert(), Lists.update(), Lists.upsert(), first our document or modifier will be automatically checked against the schema (in subtly different ways depending on the exact mutator).

What happens during the data cleaning process?

One thing that Collection2 does is “clean” the data before sending it to the database. This includes but is not limited to:

  1. Coercing types - converting strings to numbers
  2. Removing attributes not in the schema
  3. Assigning default values based on the defaultValue in the schema definition

SimpleSchema instances provide a clean method that cleans or alters data in a number of ways. It's intended to be called prior to validation to avoid any avoidable validation errors. The clean method takes the object to be cleaned as its first argument and the following optional options as its second argument:

  1. filter: Filter out properties not found in the schema? True by default. Removes any keys not explicitly or implicitly allowed by the schema, which prevents errors being thrown for those keys during validation.
  2. autoConvert: Type convert properties into the correct type where possible? True by default. Helps eliminate unnecessary validation messages by automatically converting values where possible. For example, non-string values can be converted to a String if the schema expects a String, and strings that are numbers can be converted to Numbers if the schema expects a Number.
  3. removeEmptyStrings: Remove keys in normal object or $set where the value is an empty string? True by default.
  4. trimStrings: Remove all leading and trailing spaces from string values? True by default.
  5. getAutoValues: Run autoValue functions and inject automatic and defaultValue values? True by default.
  6. isModifier: Is the first argument a modifier object? False by default.
  7. extendAutoValueContext: This object will be added to the this context of autoValue functions. Can be used to give your autoValue functions additional valuable information, such as userId. (Note that operations done using the Collection2 package automatically add userId to the autoValue context already.)

The object is cleaned in place. That is, the original referenced object will be cleaned. You do not have to use the return value of the clean method.

Do we have to call the clean method?

No. The Collection2 package always calls clean before every insert, update, or upsert.

How can we subclass Mongo.Collection and write our own insert, update or remove method?

One thing that Collection2 does is “clean” the data before sending it to the database. This includes but is not limited to:

  1. Coercing types - converting strings to numbers
  2. Removing attributes not in the schema
  3. Assigning default values based on the defaultValue in the schema definition

However, sometimes it’s useful to do more complex initialization to documents before inserting them into collections. For instance, in the Todos app, we want to set the name of new lists to be List X where X is the next available unique letter. To do so, we can subclass Mongo.Collection and write our own insert() method:

class ListsCollection extends Mongo.Collection {
  insert(list, callback) {
    if (!list.name) {
      let nextLetter = 'A';
      list.name = `List ${nextLetter}`;

      while (!!this.findOne({name: list.name})) {
        // not going to be too smart here, can go past Z
        nextLetter = String.fromCharCode(nextLetter.charCodeAt(0) + 1);
        list.name = `List ${nextLetter}`;
      }
    }

    // Call the original `insert` method, which will validate
    // against the schema
    return super.insert(list, callback);
  }
}

Lists = new ListsCollection('Lists');

The technique above can also be used to provide a location to “hook” extra functionality into the collection. For instance, when removing a list, we always want to remove all of its todos at the same time. We can use a subclass for this case as well, overriding the remove() method:

class ListsCollection extends Mongo.Collection {
  // ...
  remove(selector, callback) {
    Package.todos.Todos.remove({listId: selector});
    return super.remove(selector, callback);
  }
}

This technique has a few disadvantages:

  1. Mutators can get very long when you want to hook in multiple times.
  2. Sometimes a single piece of functionality can be spread over multiple mutators.
  3. It can be a challenge to write a hook in a completely general way (that covers every possible selector and modifier), and it may not be necessary for your application (because perhaps you only ever call that mutator in one way).

A way to deal with points 1. and 2. is to separate out the set of hooks into their own module, and simply use the mutator as a point to call out to that module in a sensible way. We’ll see an example of that below.

Point 3. can usually be resolved by placing the hook in the Method that calls the mutator, rather than the hook itself. Although this is an imperfect compromise (as we need to be careful if we ever add another Method that calls that mutator in the future), it is better than writing a bunch of code that is never actually called (which is guaranteed to not work!), or giving the impression that your hook is more general that it actually is.

Denormalization may need to happen on various mutators of several collections. Therefore, it’s sensible to define the denormalization logic in one place, and hook it into each mutator with one line of code. The advantage of this approach is that the denormalization logic is one place rather than spread over many files, but you can still examine the code for each collection and fully understand what happens on each update.

In the Todos example app, we build a incompleteCountDenormalizer to abstract the counting of incomplete todos on the lists. This code needs to run whenever a todo item is inserted, updated (checked or unchecked), or removed. The code looks like:

const incompleteCountDenormalizer = {
  _updateList(listId) {
    // Recalculate the correct incomplete count direct from MongoDB
    const incompleteCount = Todos.find({
      listId,
      checked: false
    }).count();

    Lists.update(listId, {$set: {incompleteCount}});
  },
  afterInsertTodo(todo) {
    this._updateList(todo.listId);
  },
  afterUpdateTodo(selector, modifier) {
    // We only support very limited operations on todos
    check(modifier, {$set: Object});

    // We can only deal with $set modifiers, but that's all we do in this app
    if (_.has(modifier.$set, 'checked')) {
      Todos.find(selector, {fields: {listId: 1}}).forEach(todo => {
        this._updateList(todo.listId);
      });
    }
  },
  // Here we need to take the list of todos being removed, selected *before* the update
  // because otherwise we can't figure out the relevant list id(s) (if the todo has been deleted)
  afterRemoveTodos(todos) {
    todos.forEach(todo => this._updateList(todo.listId));
  }
};

We are then able to wire in the denormalizer into the mutations of the Todos collection like so:

class TodosCollection extends Mongo.Collection {
  insert(doc, callback) {
    doc.createdAt = doc.createdAt || new Date();
    const result = super.insert(doc, callback);
    incompleteCountDenormalizer.afterInsertTodo(doc);
    return result;
  }
}

Note that we only handled the mutators we actually use in the application—we don’t deal with all possible ways the todo count on a list could change. For example, if you changed the listId on a todo item, it would need to change the incompleteCount of two lists. However, since our application doesn’t do this, we don’t handle it in the denormalizer.

Dealing with every possible MongoDB operator is difficult to get right, as MongoDB has a rich modifier language. Instead we focus on just dealing with the modifiers we know we’ll see in our app. If this gets too tricky, then moving the hooks for the logic into the Methods that actually make the relevant modifications could be sensible (although you need to be diligent to ensure you do it in all the relevant places, both now and as the app changes in the future).

It could make sense for packages to exist to completely abstract some common denormalization techniques and actually attempt to deal with all possible modifications. If you write such a package, please let us know!

How can we handle data validation using aldeed/meteor-collection2 with Meteor?

By default, you’ll have to manually validate the data that users are inserting, editing, and removing from the database. Collection2 helps with this process by extending Meteor’s functionality, allowing it to “provide support for specifying a schema and then validating against that schema when inserting and updating.” For example, you can make it so a “Books” collection has a title field that must be a string, and a lastCheckedOut field that must be a date. Here’s an example schema:

var Schemas = {};

Schemas.Book = new SimpleSchema({
    title: {
        type: String,
        label: "Title",
        max: 200
    },
    author: {
        type: String,
        label: "Author"
    },
    copies: {
        type: Number,
        label: "Number of copies",
        min: 0
    },
    lastCheckedOut: {
        type: Date,
        label: "Last date this book was checked out",
        optional: true
    },
    summary: {
        type: String,
        label: "Brief summary",
        optional: true,
        max: 1000
    }
});

To add this package to a project, write the following command:

meteor add aldeed:collection2

How can we validate data context inside a component?

In order to ensure your component always gets the data you expect, you should validate the data context provided to it. This is just like validating the arguments to any Meteor Method or publication, and lets you write your validation code in one place and then assume that the data is correct. You can do this in a Blaze component’s onCreated() callback, like so:

Template.Lists_show.onCreated(function() {
  this.autorun(() => {
    new SimpleSchema({
      list: {type: Function},
      todosReady: {type: Boolean},
      todos: {type: Mongo.Cursor}
    }).validate(Template.currentData());
  });
});

We use an autorun() here to ensure that the data context is re-validated whenever it changes.

What is the purpose of the "check" package?

Another package that the Meteor Development Group includes with every project by default is known as the “check” package, and the “check” package allows us to write check functions that, appropriately enough, check whether a certain piece of data is of a certain type or not.

Basically, we can use the “check” package to ask questions like, “Is this variable a string?” To install this package, run the following command:

meteor add check

The check function accepts two arguments. The first argument will be the piece of data that we want to check. The second argument will be the object type that we’re expecting. Since we’re expecting a string, we’ll pass through an argument of String:

check(playerNameVar, String);
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License