MySQL - Optimizing SQL Statements

The primary optimization technique for reducing lookup times is to use indexing properly. This is true for retrievals (SELECT statements), and indexing also reduces row lookup time for UPDATE and DELETE statements as well.

The way a query is written might prevent indexes from being used even if they are available. Rewriting the query often will allow the optimizer to use an index and process the query faster.

Use the right data type.

Use the right storage engine. Queries run more efficiently when you choose a storage engine with properties that best match application requirements.

Use EXPLAIN to optimize the queries.

Use a different approach to the problem. In some cases, query processing for a task can be improved by using a different approach to the problem. This includes techniques such as generating summary tables rather than selecting from the raw data repeatedly.

Using Indexes for Optimization:

Indexes contain sorted values. This allows MySQL to find rows containing particular values faster. The effect can be particularly dramatic for joins, which have the potential to require many combination of rows to be examined.

Indexes results in less disk I/O. The server can use an index to go directly to the relevant table records, which reduces the number of records it needs to read. Furthermore, if a query displays information only from indexed columns, MySQL might be able to process it by reading only the indexes and without accessing data rows at all.

Why worry about optimization?

A query that takes less time to run doesn't hold locks as long, so other clients that are trying to update a table don't have to wait as long. This reduces the chance of a query backlog building up.

A slow query takes machine resources that could be devoted to processing other queries.

A PRIMARY KEY is a unique-valued index. That is, every key value is required to be different from all others, and every key value must be not NULL.

A UNIQUE index is unique-valued index, like PRIMARY KEY, but it can be defined to allow NULL values. NULL is an exception to uniqueness because NULL values may occur multiple times.

A non-unique index is one which any key value may occur multiple times. This type of index is defined with the keyword INDEX or KEY.

A FULLTEXT index is specially designed for text searching.

A SPATIAL index can be used with the spatial data types.

To define indexes when you're initially creating a table, use CREATE TABLE. To add indexes to an already existing table, use ALTER TABLE, or CREATE INDEX.

Indexes can be used with varying degrees of success. Keep the following index-related considerations in mind when designing tables:

  1. Declare an indexed column NOT NULL if possible. Although NULL values can be indexed, NULL is a special value that requires additional decisions by the server when performing comparisons on key values. An index without NULL can be processed more simply and thus faster.
  2. Avoid over indexing. Don't index a column just because you can. If you never refer to a column in comparisons (such as in WHERE, ORDER BY, or GROUP BY clauses), there is no need to index it.
  3. Unnecessary indexing slows down table updates, which may in turn slows down other queries. If you insert a row, an entry must be added to each of the table's indexes. Indexes help when looking up values for UPDATE and DELETE statements, but any change to indexed columns require appropriate indexes to be updated as well.
  4. One strategy the MySQL optimizer use is that if it estimates that an index will return a large percentage of the records in the table, it will just as fast to scan the table as to incur the overhead required to process the index. As a consequence, an index on a column that has very few distinct values is unlikely to do much good. Suppose that a column is declared as ENUM('Y','N') and the values are roughly evenly distributed such that a search for either value returns about half of the records. In this case, an index on the column is unlikely to result in faster queries.
  5. Choose unique and non-unique indexes appropriately. The choice might be influenced by the data type of the column. If the column is declared as an ENUM, the number of distinct column values that can be stored in it is fixed. This number is equal to the number of enumeration elements, plus one for the '' empty string element that is used when you attempt to store an illegal value. Should you choose to index an ENUM column, you likely should create a non-unique index. A PRIMARY KEY would allow only as many rows as the number of distinct enumeration values. A UNIQUE index enforces a similar restriction, except that unless the column is declared NOT NULL, the UNIQUE index allows NULL values.
  6. Index a column prefix rather than the entire column. MySQL caches index information in memory whenever possible. Shortening the length of key values can improve performance by reducing the amount of disk I/O needed to read the index and by increasing the number of key values that can fit into the key cache. Short index values can be processed more quickly than long ones. When you index a column, consider whether it is sufficient to index partial column values rather than complete values. This technique of indexing a column prefix can be applied to string data types. See "Indexing Column Prefixes".
  7. Avoid creating multiple indexes that overlap (have the same initial columns). This is wasteful because MySQL can use a multiple-column index even when a query uses just the initial columns for lookups.
  8. The index creation itself can be optimized if you are creating more than one index for a given table. ALTER TABLE can add several indexes in the same statement, which is faster than processing each one separately. CREATE INDEX allows only one index to be added or dropped at a time.
  9. For MyISAM and InnoDB tables, keeping the internal index statistics up to date helps the query optimizer process queries more efficiently. You can update the statistics with ANALYZE TABLE.

Indexing Column Prefixes:

If you expect column values to be distinct most of the time in the first 15 characters, index only that many characters. To specify a prefix length for a column, follow the column name in the index definition by a number in parentheses:

CREATE TABLE t
(
    name CHAR(255),
    INDEX (name(15))
);

Indexing a column prefix can speed up query processing, but works best when the prefix values tend to have about the same amount of uniqueness as the original values. Don't use such as short prefix that you produce a very high frequency of duplicate values in the index. It might requires some testing to find the optimal balance between long index values that provide good uniqueness versus shorter values that compare more quickly but have more duplicates. To determine the number of records in the table, the number of distinct values in the column, and the number of duplicates:

SELECT
 COUNT(*) AS 'Total Rows',
 COUNT(DISTINCT name) AS 'Distinct Values',
 COUNT(*) - COUNT(DISTINCT name) AS 'Duplicate Values'
 FROM t;

The query gives you an estimate of the amount of uniqueness in the name values. Then run a similar query on the prefix values:

SELECT
 COUNT(DISTINCT LEFT(name,n)) AS 'Distinct Prefix Values',
 COUNT(*) - COUNT(DISTINCT LEFT(name,n)) AS 'Duplicate Prefix Values'
 FROM t;

This tells you how the uniqueness change when you use an n-character prefix of the name values. Run the queries with different values of n to determine an acceptable prefix length.

If an index on a full column is a PRIMARY KEY or UNIQUE index, you'll probably have to change the index to be non-unique if you decide to index prefix values instead. When you index partial column values, it is more likely that the prefix values will contain duplicates.

Leftmost Index Prefixes:

In a table that has composite (multiple-column) index, MySQL can use the leftmost index prefixes of that index. A leftmost prefix of a composite index consists of one or more of the initial columns of the index. MySQL's capability to use leftmost index prefixes enables you to avoid creating unnecessary indexes.

Query Rewriting Techniques:

Don't refer to an indexed column within an expression that must be evaluated for every row in the table. Doing so prevents use of the index. Instead, isolate the column onto one side of a comparison when possible. Suppose that a table t contains a DATE column d that is indexed. One way to select rows containing date values from the year 1994 and up is as follows:

SELECT * FROM t WHERE YEAR(d) >= 1994;

In this case, the value of YEAR must be evaluated for every row in the table, so index cannot be used. Instead, write the query like this:

SELECT * FROM t WHERE d >= '1994-01-01';

In the rewritten expression, the indexed column stands by itself on one side of the comparison and MySQL can apply the index to optimize the query.

Indexes are particularly beneficial for joins that compare columns from two tables. You should index both of the joined columns when you use inner joins because the optimizer might process the tables in either order.

When comparing an indexed column to a value, use a value that has the same data type as the column. In MySQL, type conversion may cause an index not to be used.

In certain cases, MySQL can use an index for pattern-matching operations performed with the LIKE operator. This is true if the pattern begins with a literal prefix value rather than a wildcard character. This is because the pattern match is logically equivalent to a search for a range of values: WHERE name >= 'de' AND name < 'df'; When a pattern starts with a wildcard character, MySQL cannot make efficient use of any indexes associated with that column. That is, even if an index is used, the entire index must be scanned.

Using EXPLAIN to obtain optimizer information:

mysql> EXPLAIN SELECT * FROM t WHERE d >= '1994-01-01'

MySQL can perform a range scan using the index for column d, drastically reducing the number of rows that needs to be examined.

Optimizing queries by using LIMIT. Less information is returned over the network. In many cases, LIMIT allows the server to terminate query processing earlier than it would otherwise. Some row-sorting techniques have the property that the first n rows can be known to be in final order even before the sort has been done completely. This means that when LIMIT n is combined with ORDER BY, the server might be able to determine the first n rows and then terminate the sort operation early.

Don't abuse LIMIT. Use a WHERE clause to restricts the initial result set.

Select only the columns that you need. This is also referred to as limiting the output horizontally. This is faster because MySQL has to return less information. If an index exists for the named columns, you get even more improvement for two reasons: the use of the index, and depending on the storage engine, the server might not read the table rows at all (if the values requested by the query are in the index, then by reading the index, MySQL already has the information that the client requested)

Use summary table. Suppose that you run an analysis consisting of a set of retrievals, each perform the same complex SELECT, and differs only in the way each summarize the records. A better technique is to select the records once into a temporary table, and then use the temporary table to generate the summaries. You can use CREATE TEMPORARY TABLE … SELECT. You can create appropriate indexes on the temporary table.

If the original table is a type that is subject to table-level locking, such as MyISAM table, using a summary table leaves the original table available more of the time for updates by other clients by reducing the amount of time that table remains locked.

If the summary table is small enough that it's reasonable to hold in memory, you can increase performance even more by making it a MEMORY table.

When a summary consists of a single value, you need not create a table at all. If you assign the value to a user variable, you can use the variable for comparison purposes in subsequent queries without having to calculate the value again.

Don't use EXPLAIN with DELETE or UPDATE. Use EXPLAIN on a SELECT with the same WHERE clause.

Use multiple-row INSERT instead of multiple single-row INSERT. Multiple-row INSERT statement allows the server to perform all the updates at once and flush the index a single time.

If you are using InnoDB table, you can get better performance for single-row INSERT statements by grouping them within a transaction rather than executing them with autocommit mode enabled. Using a transaction allows InnoDB to flush all the changes at commit time. In autocommit mode, InnoDB flushes the changes for each INSERT individually.

For any storage engine, LOAD DATA INFILE is faster than multiple-row INSERT.

You can disable index updating when loading data into an empty MyISAM table to speed up the operation. LOAD DATA INFILE deos this automatically for non-unique indexes if the table is empty.

To replace existing rows, use REPLACE rather than DELETE and INSERT.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License