Disable DNS Hostname Lookup
When this option is activated, we have to use IP numbers in the MySQL Grant table. Never use Hostnames in MySQL Authentication, even if you turned name resolution on. It could be simply spoofed and is dangerous to use Hostnames. Only use IP Addresses for Host authetication.
COUNT(*) in MySQL, can either be almost-free or painfully slow depending on which storage engine you’re using.
what conditions does your database invalidate caches, when does it sort on disk rather than in memory, when does it need to create temporary tables, etc.
Profile, profile, profile! So, you’re able to generate high-stress situations, but now you need to find the cracks. This is what profiling is for. Profiling enables you to find the bottlenecks in your configuration, whether they be in memory, CPU, network, disk I/O, or, what is more likely, some combination of all of them. The very first thing you should do is turn on the MySQL slow query log and install mtop. This will give you access to information about the absolute worst offenders. Have a ten-second query ruining your web application? These guys will show you the query right off.
After you’ve identified the slow queries you should learn about the MySQL internal tools, like EXPLAIN, SHOW STATUS, and SHOW PROCESSLIST. These will tell you what resources are being spent where, and what side effects your queries are having, e.g., whether your heinous triple-join subselect query is sorting in memory or on disk. Of course, you should also be using your usual array of command-line profiling tools like top, procinfo, vmstat, etc. to get more general system performance information.
Partition Your Tables
Often you have a table in which only a few columns are accessed frequently.
CREATE TABLE posts ( id int UNSIGNED NOT NULL AUTO_INCREMENT, author_id int UNSIGNED NOT NULL, title varchar(128), created timestamp NOT NULL, PRIMARY KEY(id) ); CREATE TABLE posts_data ( post_id int UNSIGNED NOT NULL, teaser text, body text, PRIMARY KEY(post_id) );
The above represents a situation where one is optimizing for reading. Frequently accessed data is kept in one table while infrequently accessed data is kept in another. Since the data is now partitioned the infrequently access data takes up less memory. You can also optimize for writing: frequently changed data can be kept in one table, while infrequently changed data can be kept in another. This allows more efficient caching since MySQL no longer needs to expire the cache for data which probably hasn’t changed.
Don’t Overuse Artificial Primary Keys
Artificial primary keys are nice because they can make the schema less volatile. If we stored geography information in the US based on zip code, say, and the zip code system suddenly changed we’d be in a bit of trouble. On the other hand, many times there are perfectly fine natural keys. One example would be a join table for many-to-many relationships. What not to do:
CREATE TABLE posts_tags ( relation_id int UNSIGNED NOT NULL AUTO_INCREMENT, post_id int UNSIGNED NOT NULL, tag_id int UNSIGNED NOT NULL, PRIMARY KEY(relation_id), UNIQUE INDEX(post_id, tag_id) );
Not only is the artificial key entirely redundant given the column constraints, but the number of post-tag relations are now limited by the system-size of an integer. Instead one should do:
CREATE TABLE posts_tags ( post_id int UNSIGNED NOT NULL, tag_id int UNSIGNED NOT NULL, PRIMARY KEY(post_id, tag_id) );
Indexes: You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated. There is a balance point which you can uncover by profiling your code. This varies from system to system and implementation to implementation.
You can also give MySQL index hints about which indices to use. MySQL gets it right most of the time and when it doesn’t it is usually because of a bad scheme or poorly written query.
Database indexes are auxiliary data structures that allow for quicker retrieval of data. The most common type of index is a B-tree index because it has very good general performance characteristics and allows a wide range of comparisons, including both equality and inequalities.
The penalty for having a database index is the cost required to update the index, which must happen any time the table is altered. There is also a certain about of space overhead, although indexes will be smaller than the table they index.
For specific data types different indexes might be better suited than a B-tree. R-trees, for example, allow for quicker retrieval of spatial data. For fields with only a few possible values bitmap indexes might be appropriate.
Understand your engines:
In the broadest sense MyISAM is good for read-heavy data and InnoDB is good for write-heavy data, though there are cases where the opposite is true. The biggest gotcha is how the two differ with respect to the COUNT function.
MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query.
InnoDB, however, has no such cache.
For a concrete example, let’s say we’re trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, running "SELECT COUNT(*) FROM users LIMIT 5,10" it is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB.
MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS(). This is very MySQL-specific, but can be necessary in certain situations, particularly if you use InnoDB for its other features (e.g., row-level locking, stored procedures, etc.).