Mongodb - Indexes

mongodb

Does Mongo use indexes like traditional RDBMS?

Yes, but perhaps differently.

What are the different types of indexes supported by Mongo?

  1. Default _id Index
  2. Secondary Index
  3. Compound Index
  4. Multikey Index
  5. Multikey Compound Index

Can we define multiple indexes for a single collection?

Yes. Although more than one index can be defined on a collection, a query can only use one index during its execution. The decision of choosing the best index out of the available options is made at runtime by MongoDB’s query-optimizer.

How does Mongo use the Default _id Index?

By default, MongoDB creates a default index on the _id field for each collection. Each document has a unique _id field as a primary key, a 12-byte ObjectID. When there are no other any indexes available, this is used by default for all kinds of queries.

How can we view the indexes that had been defined for a collection?

To view the indexes for a collection, open the MongoDB shell and do the following:

use project
db.posts.getIndexes()

How can we interpret the result displayed by getIndexes()?

mongo-1-1.png

The getIndexes() method returns all of the indexes for our collection. As you can see, we have the default index with name _id_. The key field indicates that the index is on the _id field, and the value of 1 indicates an ascending order.

What is the definition of Secondary Index?

For cases where we want to use indexing on fields other than _id field, we have to define custom indexes. Suppose we want to search for posts based on the user_name field. In this case, we’ll define a custom index on the user_name field of the collection. Such custom indexes, other than the default index, are called secondary indexes.

How can we create a secondary index?

mongo-1-3.png

We created an index on the user_name field in the posts collection using the ensureIndex() method. I’m sure you’ve niced the value of the order argument to the method which indicates either an ascending (1) or descending (-1) order for the search. Each document has a timestamp field. If we want the most recent posts first, we would use descending order. For the oldest posts first, we would choose ascending order.

What is the definition of Compound Index?

It is a composite index. In other words, it is an index that contains multiple columns or keys.

How can we define a compound index?

mongo-1-6.png

Can a compound index be used to query a subset of its fields?

A very important point of note here is that compound indexes defined on multiple fields can be used to query a subset of these fields. For example, suppose there is a compound index {field1,field2,field3}. This index can be used to query on:

  1. field1
  2. field1, field2
  3. field1, field2, field3

So, if we’ve defined the index {field1,field2,field3}, we don’t need to define separate {field1} and {field1,field2} indexes. However, if we need this compound index while querying field2 and field2,field3, we can use hint() if the optimizer doesn’t select the desired index.

How can we use hint() to force MongoDB to use an specific index?

The hint() method can be used to force MongoDB to use an index we specify and override the default selection and query optimization process. You can specify the field names used in the index as a argument as shown below:

<?php
// query to find posts with type public and 100 likes
// use hint() to force MongoDB to use the index we created
$cursor = $collection
    ->find(
        array(
            "post_type" => "public",
            "post_likes" => 100
        )
    )
    ->hint(
        array(
            "post_type" => 1,
            "post_likes" => 1
        )
    );

What is the definition of Multikey Index?

When indexing is done on an array field, it is called a multikey index. Consider our post document again; we can apply a multikey index on post_tags. The multikey index would index each element of the array, so in this case separate indexes would be created for the post_tags values: MongoDB, Tutorial, Indexing, and so on.

mongo-1-8.png

Indexes on array fields must be used very selectively, though, as they consume a lot of memory because of the indexing of each value.

Why should we use multikey index selectively?

Indexes on array fields must be used very selectively, though, as they consume a lot of memory because of the indexing of each value.

What is the definition of Multikey Compound Index?

We can create a multikey compound index, but with the limitation that at most one field in the index can be an array. So, if we have field1 as a string, and [field2, field3] as an array, we can’t define the index {field2,field3} since both fields are arrays.

mongo-1-9.png

How can we create an index?

We can use the ensureIndex method or the createIndex method.

Use the createIndex() method to create an index on a collection. Indexes can support the efficient execution of queries. MongoDB automatically creates an index on the _id field upon the creation of a collection.

To create an index on a field or fields, pass to the createIndex() method an index key specification document that lists the fields to index and the index type for each field:

{ <field1>: <type1>, ...}
  1. For an ascending index type, specify 1 for <type>.
  2. For a descending index type, specify -1 for <type>.

createIndex() only creates an index if the index does not exist.

Create an ascending index on the "cuisine" field of the restaurants collection:

db.restaurants.createIndex( { "cuisine": 1 } )

MongoDB supports compound indexes which are indexes on multiple fields. The order of the fields determine how the index stores its keys. For example, the following operation creates a compound index on the "cuisine" field and the "address.zipcode" field. The index orders its entries first by ascending "cuisine" values, and then, within each "cuisine", by descending "address.zipcode" values.

db.restaurants.createIndex( { "cuisine": 1, "address.zipcode": -1 } )

What are the limitations with indexes in Mongo?

It is important to know that indexing can’t be used in queries which use:

  1. regular expressions
  2. negation operators (i.e. $ne, $not, etc.)
  3. arithmetic operators (i.e. $mod, etc.)
  4. JavaScript expressions in the $where clause
  5. some other cases

What are some factors that we must consider when considering whether or not to add new indexes?

  1. Each index occupies space as well as causes extra overhead on each insert, update, and delete operation on the collection. You need to consider the read:write ratio for each collection; indexing is beneficial to read-heavy collections, but may not be for write-heavy collections.
  2. MongoDB keeps indexes in RAM. Make sure that the total index size does not exceed the RAM limit. If it does, some indexes will be removed from RAM and hence queries will slow down.

How many indexes can a collection have?

A collection can have a maximum of 64 indexes.

What happens when we create index on a sub-document field?

Suppose we want to search posts based on where the user lives. For this, we need to create an index on the sub-document location field, which in turn indexes the sub-fields. Then we’ll be able to use the index for the following kinds of queries:

<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
    array(
        "location" => "Los Angeles"
    ),
    array()
);

// query to find posts from the state of California
$cursor = $collection->find(
    array(
        "location" => "California"
    ),
    array()
);

// query to find posts from the United States
$cursor = $collection->find(
    array(
        "location" => "USA"
    ),
    array()
);

We’re able to search all of the sub-fields (city, state, and country) in the sub-document using only location as the key. The query looks to see if any of the sub-fields of location meet our search criteria.

It should be noted that, similar to indexing on arrays, separate indexes are created for all the of the sub-fields internally. In this case, three indexes are created as location.city, location.state and location.country, hence such indexes should be used with care since each index occupies space in memory.

How can we create index on Embedded Fields?

It will happen sometimes that we won’t need indexes on all of the fields of a sub-document. If in our application we only want to find posts based on city but not state or country, we can create the index on the embedded field city.

mongo-2-1.png

We can now use this index in queries to find posts based on city:

<?php
// query to find posts from the city of Los Angeles
$cursor = $collection->find(
    array(
        "location.city" => "Los Angeles"
    ),
    array()
);

What is the definition of Covered Queries?

A covered query is the one in which:

  1. all fields used in the query are part of an index used in the query, and
  2. all the fields returned in the results are in the same index

Since all the fields are covered in the index itself, MongoDB can match the query condition as well as return the result fields using the same index without looking inside the documents. Since indexes are stored in RAM or sequentially located on disk, such access is a lot faster.

How can we determine if our query is a covered query?

To check if the query is covered, we can look to the indexOnly field in the result of the explain() method. A true value of indicates that ours was a covered query.

mongo-2-3.png

It’s important to know that an index can’t cover a query if:

  1. any of the indexed fields is an array, or
  2. any of the indexed fields are fields in sub-documents

Thus, it’s always a good practice to check your query index usage with explain().

How can we determine the current index size for a collection?

To check the current index size for a database, we can use the totalIndexSize() method which returns the index size in bytes.

mongo-2-4.png

We just have to ensure that we have enough RAM available to accommodate indexes as well as the data that MongoDB manages and uses regularly.

How can we remove indexes?

To delete an existing index, and thus free up resources, we use the dropIndex() method:

mongo2-5.png
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License