MySQL - Server and Environment

mysql

The InnoDB log buffer holds information about modifications made during transaction processing. Ideally, you want a transaction's changes to be held in the buffer until the transaction commits, at which point they can be written to the InnoDB log file all at once. If the buffer is too small, changes might need to be written several times before commit time, resulting in additional disk activity. The log buffer size is controlled by the innodb_log_buffer_size system variable. Typical values range from 1MB to 8MB. The default is 1MB.

Database performance depends heavily on disk I/O speed. Disk accesses are very slow compared to memory operations. In addition to using faster disks, database performance can be improved by minimizing disk accesses. Also, once data is brought into memory (data buffer), keeping it there for future use as long as memory is available will reduce disk accesses there by speeding up query execution. The larger the data buffer, the more the data that can be kept in memory. See http://www.learn-mysql-tutorial.com/TuneMySQL.cfm

Disks are mechanical devices with moving parts. The time to read/write data consists of three components in order of decreasing time consumption:

  • Seek Time: Time taken by the disk head to move to the appropriate track.
  • Rotation Time: Time taken by the platter to rotate so that the appropriate block on the track is under the disk head.
  • Transfer Time: Time to read or write the data block.

Optimizing Disk Accesses

Database systems try to optimize disk access time by storing tables, indexes, and other database items to minimize seek and rotational times. Rows inserted into a table are stored contiguously in a block with other rows or in adjacent blocks, as appropriate. The execution time of a query, such as a SELECT statement without a WHERE clause, that needs to access a complete table will be minimized if the table is stored in contiguous blocks. Data placement decisions are made by the database system and are hidden from the users.

A disk block is the unit of disk I/O and storage. The size of the block can be explicitly specified by a database designer or a DBA, overriding the default size. A typical block size ranges from 2K to 16K bytes. The block size is specified when a database is created and typically cannot be altered once defined.

Large disk block sizes increase the number of bytes transferred per expensive seek thus reducing number of seeks. But using large blocks for applications with updates involving several blocks will cause each update to take longer as it gets expensive to write larger blocks. The default block size should suffice for most applications. In case of databases with special needs, say for multimedia databases with large items, a larger block size may be beneficial to reduce the number of items spread over more than one disk block for storage.

Disk seeks problem becomes more pronounced as data volumes starts to grow so large that effective caching becomes harder. For large databases with random data access, you will likely need at least one disk seek to read and a couple of disk seeks to write things. For database servers, employ disks with lower seek times.

Increase the number of available disk spindles which reduces the seek overhead by either symlinking files to different disks or striping the disks.

Using Symbolic Links

You can move tables and databases from the database directory to other locations and replace them with symbolic links to the new locations. You might want to do this, for example, to move a database to a file system with more free space or increase the speed of your system by spreading your tables to different disk.

The recommended way to do this is simply to symlink databases to a different disk.

Striping

If an installation has several disks, Striping means putting the first block on the first disk, the second block on the second disk, and the N-th block on the (N MOD number_of_disks) disk, and so on.

If your average data size is less than or equal to the stripe size, performance for multiple reads will improve.

Striping may be suitable for some parallel disk-access systems such as RAID.

We can also tune the Operating System settings for the filesystem that the database is deployed under.

Memory:

If the needed data is in memory or data buffers, then disk accesses are not required saving time. If not, then disk blocks containing the data must be read from the disk into the data buffer. Larger data buffers increase the chance of finding the needed data in the buffer.

If the data buffer is running out of space for new items, then some items in the buffer must be deleted. And if these memory blocks have been altered via updates, then they must be written to disk before removal. To determine which items should be deleted, database systems use algorithms similar to those used in operating systems such as the least- recently-used (LRU) and the first-in-first-out (FIFO) algorithms and their variants. Note that the data buffer is conceptually similar to a cache in an operating system.

The DBA has the primary responsibility of determining and specifying memory needs of a database system to ensure optimal performance. To tune performance, the DBA must determine and specify buffer sizes.

In MySQL, it is possible to configure the buffer sizes for:

  • Indexes
  • Joins
  • Data
  • Sorting
  • Caching queries
  • Caching query results

Buffers for caching queries and their results are very useful in situations where the database does not change often and the same queries are executed repeatedly. This is typical for many web servers that serve dynamic pages using database content.

Add more memory to allow larger buffers to be used. This improves caching so that disk activity can be minimized. The performance effect can be considerable because it's much faster to read information from memory than from disk. Adding memory also can reduce amount of swapping the operating system needs to do.

Maximize on-board processing power:

  • Use a 64-bit CPU rather than a 32-bit CPU. A 64-bit CPU allows certain mathematical (and other) functions to complete faster. It also allows MySQL to support larger internal cache sizes.
  • Use a multiprocessor system. If MySQL uses kernel threads, it can take advantage of multiple processors.
  • A faster main logic board (motherboard) improves general system throughput.

Use a faster network, so that the server can transfer information to client faster. This lets the server process queries faster, reducing resource contention.

Choose disks with better performance.

Configuring Disks

MySQL server makes heavy use of disk resources. All storage engines except the MEMORY engine store table contents on disk, and log files are recorded on disk. Consequently, the physical characteristics of your disks and disk-related subsystems strongly influence server performance:

Physical disk characteristics are important because slow disks hinder the server. However, disk speed can be measured in various ways. The most important parameter is seek time, not transfer rate. It is more important for the heads to move quickly from track to track than for platters to spin more quickly. A RAM disk reduces seek time to near zero, because there is no physical movement at all.

With a heavy I/O burden, a faster disk controller helps improve disk subsystem throughput. So does installing an additional controller and dividing disk assignments between controllers.

RAID drives can improve retrieval performance, and some forms of RAID also boost write performance. Other benefits of RAID drives include data redundancy through mirroring and parity checking. Some RAID systems enable you to replace a disk without powering down the server host.

Using disks with better physical characteristics improves server performance. Additionally, the way you employ your disks has a bearing on performance:

Distributing parts of your MySQL installation onto different disks can improve performance by splitting up database-related disk activity to distribute it more evently. You can do this in several ways:

  • Put log files on one disk and databases on another disk. This can be done using server options. Each option that enables a log allows you to specify the log file location. To move the entire data directory, copy it to a different location and specify the new location with the —datadir option.
  • Use a separate disk for temporary file storage. This can be done using the —tmpdir server option.
  • Distribute databases among several disks. To do this for a given database, move it to a different location, and then create a symbolic link in the data directory that points to the new location. This achieve redistribution of storage and disk I/O. "Moving Databases Using Symbolic Links" is different from "MyISAM Table Symlinking"
  • A strategy for distributing disk activity that is possible but not necessarily recommended is to put individual MyISAM tables on different disks by using CREATE TABLE options. Table symlinking is not universally supported on all systems, and spreading your tables around can make it difficult to keep track of how much table storage you're using on which file systems. Additionally, some filesystem commands do not understand symbolic links.

Use a type of filesystem that is suited for the tables you have. MySQL can run on pretty much any kind of filesystem supported by your operating system, but some type of filesystems might be better for your installation than others. Two factors to consider are the maximum table size you need and the number of tables in your data.

In general, you can use larger MyISAM tables with filesystems or operating systems that allow larger files. The MyISAM storage engine has an internal file size limit of 65,536TB, but MyISAM tables cannot actually use files that large unless the filesystem allows it. For example, older Linux kernels may impose a size limit of 2G. If you use a recent Linux kernel, the file size limit goes up considerably.

The number of tables in a database can have an effect on table-opening time and on the time to check files after a machine crash. MyISAM represents a MyISAM table on disk by three files (the .frm format file, the .MYD data file, and .MYI index file), that translates into many small files in the database directory if you have many small MyISAM tables in a database. For some filesystem types, this results in significantly increase directory look up time. In situations like this, filesystems such as ReiserFS or ext3 can help performance. They're designed to deal well with large numbers of small files and to provide good directory lookup time. Also, the recovery time to check the filesystem after a machine crash is very good.

Have one dedicated disk for the system, programs and for temporary files. If you do very many changes, put the update logs and transactions logs on dedicated disks.

Low seek time is important for the database disk; For big tables you can estimate that you will need: log(row_count) / log(index_block_length/3*2/(key_length + data_ptr_length))+1 seeks to find a row. For a table with 500,000 rows indexing a medium int: log(500,000)/log(1024/3*2/(3+4)) +1 = 4 seeks The above index would require: 500,000 * 7 * 3/2 = 5.2M. In real life, most of the blocks will be buffered, so probably only 1-2 seeks are needed.

For writes you will need (as above) 4 seek requests, however, to find where to place the new key, and normally 2 seeks to update the index and write the row.

For REALLY big databases, your application will be bound by the speed of your disk seeks, which increase by N log N as you get more data.

Split databases and tables over different disks. In MySQL you can use symbolic links for this.

Striping disks (RAID 0) will increase both read and write throughput.

Striping with mirroring (RAID 0+1) will give you safety and increase the read speed. Write speed will be slightly lower.

Don't use mirroring or RAID (except RAID 0) on the disk for temporary files or for data that can be easily re-generated..

On Linux use hdparm -m16 -d1 on the disks on boot to enable reading/writing of multiple sectors at a time, and DMA. This may increase the response time by 5-50 %.

On Linux, mount the disks with async (default) and noatime.

For some specific application, one may want to have a ram disk for some very specific tables, but normally this is not needed.

Using Database Symbolic Links on Unix

To relocate a database directory under Unix, use the following procedure:

  1. Stop the server
  2. Move the database directory from the data directory to its new location
  3. In the data directory, create a symbolic link that has the same name as the database and that points to the new location
  4. Restart the server

Network Speed

Network speed is important. With a faster network, the server can transfer information to clients faster. This lets the server process queries faster, reducing resources contention. Consider using gigabit ethernet if your situation allows.

The speed of your network is important, but is not the only significant factor. All other things being equal, the network is effectively faster for clients that are near to the server than for those far away. Information need not travel as far for closer clients, so the latency of response is not as great. If possible, keep the network path short. The introduction of routers, switches, and other devices along the way can also increase latency.

The Operating System

MySQL has many configurable parameters that you can change to optimize server performance. Several of these are related to operating system resources and thus cannot be set higher than what the operating system allows. MySQL operates within the boundaries of the limit set by the OS. You can sometimes gain more latitude by increasing operating system limits, then MySQL can take advantage of the higher limits. Some of the relevant operating system limits include:

  • The per-process limit on the number of open files. This limits the maximum size of the table cache that holds file descriptors for table files. You can tell MySQL to allocate more file descriptors with the —open-files-limit option, but that option cannot be increased beyond the per-process limit allowed by the operating system. If your operating system can be reconfigured, you might be able to increase this limit, which effectively allows a larger maximum table cache size.
  • The maximum number of clients that can be connected simultaneously. This limit is controlled by the max_connections server variable. You can increase this variable, but not beyond the number of threads allowed to each process by the operating system.
  • The number of queued network connections for clients that are waiting to connect. For a busy server with a high rate of client connections, increasing the backlog allowed by the operating system allows you to increase the value of the back_log server variable that governs the size of the server's queue.

For large databases, the optimization order is normally RAM, Fast disks, CPU power.

More RAM can speed up key updates by keeping most of the used key pages in RAM.

If you are not using transaction-safe tables or have big disks and want to avoid long file checks, a UPS is good idea to be able to take the system down nicely in case of a power failure.

For systems where the database is on a dedicated server, one should look at 1G Ethernet. Latency is as important as throughput.

No swap; If you have memory problems, add more RAM instead or configure your system to use less memory.

Don't use NFS disks for data (you will have problems with NFS locking).

Increase number of open files for system and for the SQL server. (add ulimit -n # in the safe_mysqld script).

Increase the number of processes and threads for the system.

If you have relatively few big tables, tell your file system to not break up the file on different cylinders (Solaris).

Use file systems that support big files (Solaris).

Choose which file system to use; Reiserfs on Linux is very fast for open, read and write. File checks take just a couple of seconds.

Use persistent connections.

Cache things in your application to lessen the load of the SQL server.

Don't query columns that you don't need in your application.

Don't use SELECT * FROM table_name…

Use LOCK TABLES if you do a lot of changes in a batch; For example group multiple UPDATES or DELETES together.

If you have a slow net connection to the database, use the compressed client/server protocol.

Don't use GRANT on table level or column level if you don't really need it.

If possible, run OPTIMIZE table once in a while. This is especially important on variable size rows that are updated a lot.

Update the key distribution statistics in your tables once in a while with myisamchk -a; Remember to take down MySQL before doing this!

If you get fragmented files, it may be worth it to copy all files to another disk, clear the old disk and copy the files back.

If you have problems, check your tables with myisamchk or CHECK table.

Monitor MySQL status with: mysqladmin -i10 processlist extended-status

Use mysqladmin debug to get information about locks and performance.

Use keys wisely. Keys are good for searches, but bad for inserts / updates of key columns.

Keep by data in the 3rd normal database form, but don't be afraid of duplicating information or creating summary tables if you need more speed.

Instead of doing a lot of GROUP BYs on a big table, create summary tables of the big table and query this instead.

Don't create indexes you are not going to use.

Use the fact that MySQL can search on a prefix of an index; If you have and INDEX (a,b), you don't need an index on (a).

Instead of creating an index on long CHAR/VARCHAR column, index just a prefix of the column to save space. CREATE TABLE table_name (hostname CHAR(255) not null, index(hostname(10)))

Columns with identical information in different tables should be declared identically and have identical names.

Important start-up options

back_log
thread_cache_size
key_buffer_size
table_cache
delay_key_write
log_slow_queries
max_heap_table_size // Used with GROUP BY
sort_buffer // Used with ORDER BY and GROUP BY
myisam_sort_buffer_size // Used with REPAIR TABLE
join_buffer_size // When doing a join without keys

http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html

How can we instruct mysqld to keep the update log?

Starts the server using:

safe_mysqld --log-update=update

Alternatively, we can find the my.cnf file and add the following line:

log_update=/usr/backups/mysql/update

The update log is stored on the server's data directory ( /usr/local/mysql/var )

How can we tell mysql to flush its logs?

mysqladmin -u root -ppassword flush_logs

How can we apply a particular update log?

mysql -u root -ppassword --one-database dbname < update.100
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License