Large and Complex Table Changes and Maintenance

mysql

How can we handle large complex table changes and maintenance with minimal downtime?

I have heard about master-master, dual master, redundant, fail-over setup. Ideally, when we need to deploy changes or do maintenance, we should remove some database nodes from normal cluster, stop replication, do the changes, resume replication, wait for the replication to be caught up, put these nodes back into the cluster, but promote one or some of them as master as appropriate, and repeat the above steps for the remaining nodes. We probably want to do the same with the web servers as well. Perhaps not all code changes can follow this approach, but we probably need to test the deployment steps as part of the QA process. Automated deployment is nice, but perhaps when the database is really large, we may need to do manually deployment some of the time. We should strive for automated deployment while minimizing downtime.

Complex table structure changes usually require a manual move process involving these steps:

  1. Create a new table with the new column layout.
  2. Use INSERT SELECT statement to copy data from the old table to the new table.
  3. Verify that the new table contains the desired data.
  4. Rename the old table (or delete it if you are brave)
  5. Rename the new table with the name previously used by the old table.
  6. Recreate any triggers, stored procedures, indexes, and foreign keys as needed.

What are some of the table maintenance operations?

  1. TABLE CHECK: perform an integrity check to make sure that the table's structure and content have no problems. This operation can be done for MyISAM and InnoDB tables.
  2. TABLE REPAIR: corrects integrity problem to restore the table to a known, usable state. This operation can be done for MyISAM tables.
  3. TABLE ANALYSIS: updates statistics about the distribution of index key values. This is information that the optimizer use to generate execution plans for queries. This operation can be done for MyISAM and InnoDB tables.
  4. TABLE OPTIMIZATION: re-organizes a table so that its contents can be accessed more efficiently. This operation can be done for MyISAM and InnoDB tables.

How frequent should we do TABLE ANALYSIS and TABLE OPTIMIZATION operation?

Periodically. TABLE ANALYSIS and TABLE OPTIMIZATION are operations that you might want to perform periodically, perhaps once every 3 or 6 month

What happens when MySQL analyzes a MyISAM or InnoDB table?

It updates the index statistics. The optimizer uses these statistics when processing queries to make better decisions about how best to lookup records in the table and the order in which to read tables in a join.

When MySQL optimizes a MyISAM table, it defragment the data file to reclaim unused space, sort the indexes, and updates the index statistics. Periodic defragmenting is useful for speeding up table access for tables that contain variable-length columns such as VARCHAR, VARBINARY, BLOB, or TEXT.

How can we use CHECK TABLE, REPAIR TABLE, ANALYZE TABLE, and OPTIMIZE TABLE?

CHECK TABLE table_name;
REPAIR TABLE table_name;
ANALYZE TABLE table_name;
OPTIMIZE TABLE table_name;
  1. CHECK TABLE statement perform an integrity check on table structure and contents. It works for MyISAM and InnoDB tables. For MyISAM tables, it also updates the index statistics. If the table is a view, it verifies the view definition.
  2. REPAIR TABLE statement corrects problems in a table that has become corrupted. REPAIR TABLE only works for MyISAM tables. You can tell the server to repair MyISAM tables automatically.
  3. ANALYZE TABLE statement updates the table with information about the distribution of key values in the table. This statement works for MyISAM and InnoDB tables.
  4. OPTIMIZE TABLE statement cleans up a MyISAM table by defragmenting it. This reclaims unused space resulting from deletes and updates, and coalescing records that have become split and stored non-contiguously. OPTIMIZE TABLE also sorts the index pages if they are out of order and updates the index statistics. OPTIMIZE TABLE also works for InnoDB tables, but maps to ALTER TABLE, which rebuilds the table. This updates index statistics and free space in the clustered index.

MySQL Administrator offers point-and-click interface for table check, repair, and optimize operations. When you select one of these operations, MySQL Administrator sends the corresponding SQL statement to the server.

How can we use the mysqlcheck utility program?

mysqlcheck can check, repair, analyze, and optimize tables. It determines which options were given on the command line, and then sends appropriate SQL statements to the MySQL server to perform the requested operation.**

mysqlcheck has properties that in some context make it more convenient than issuing SQL statements directly. If you name a database, it determines what tables the database contains, and issues statements to process them all. Because, mysqlcheck is a command-line program, it can easily be used in a cron job.

mysqlcheck has three general modes of operation, depending on the arguments with which you invoke it:

  1. By default, mysqlcheck interprets its first non-option argument as a database name and checks all the tables in that database. If any other arguments follow the database name, mysqlcheck treats them as table names and check just those tables.
  2. With the —databases (or -B) option, mysqlcheck interprets its non-option arguments as database names and check all tables in each of the named databases.
  3. With the —all-databases (or -A) option, mysqlcheck check all tables in all databases.

mysqlcheck also supports options that indicate which operation to perform on the specified tables (—check, —repair, —analyze, and —optimize). The default is to check all tables if none of these options is given.

A recommended strategy is to run mysqlcheck with no options. If any errors occur, run mysqlcheck again with —repair and —quick options to attempt a quick repair. If that fails, run mysqlcheck with —repair for a normal repair, and then if necessary, with —repair and —force.

How can we use the myisamchk utility program?

myisamchk is a utility program. It directly reads and modifies the table files. The myisamchk utility performs table maintenance on MyISAM tables. Conceptually, myisamchk is similar to mysqlcheck, but the two programs do differs in certain ways:

  1. Both programs can check, repair, and analyze MyISAM tables. mysqlcheck can optimize MyISAM tables, as well as check InnoDB tables. There are certain operations that myisamchk can perform that mysqlcheck cannot, such as disabling or enabling indexes.
  2. mysqlcheck is a client program, whereas myisamchk is a utility program.

To perform table maintenance with myisamchk:

  1. Stop the server
  2. From a command prompt, change location to the database directory.
  3. Invoke myisamchk with options indicating the operation you want to perform, followed by arguments that name the tables. Each of these arguments can be either a table name, or the name of the table's index file. **An index file name is the same as the table name, plus a .MYI suffix.
  4. Restart the server

The default myisamchk operation is to check tables.

shell> myisamchk City
shell> myisamchk City.MYI
shell> myisamchk --recover City

If a repair operation performed with —recover encounters problems that it cannot fix, try using the —safe-recover option.

What are the command line options for mysqlcheck and myisamchk?

—analyze or -a Analyze the distribution of key values in the table. This can improve performance of queries by speeding up index-based lookups.

—auto-repair (mysqlcheck only) Repairs tables automatically if a check operation discovers problems.

—check or -c Check tables for problems. This is the default action if no other operation is specified.

—check-only-changed or -C Skip table checking except for tables that have been changed since they were last checked or tables that haven't been properly closed. The latter condition might occur if the server crashes while a table is open.

—fast or -F Skip table checking except for tables that haven't been properly closed.

—extended (for mysqlcheck), —extended-check (for myisamchk), or -e (for both programs) Run an extended table check. For mysqlcheck, when this option is given in conjunction with a repair option, a more thorough repair is performed.

—medium-check or -m Run a medium table check

—quick or -q For mysqlcheck, —quick without a repair option causes only the index file to be checked, leaving the data file alone. For both programs, —quick in conjuction with a repair option cause the program to repair only the index file, leaving the data file alone.

—repair (for mysqlcheck), —recover (for myisamchk), or -r (for both programs) Run a table repair operation.

Can we use REPAIR TABLE to repair InnoDB tables?

No. REPAIR TABLE only work on MyISAM.

How can we repair InnoDB Tables?

REPAIR TABLE only applies to MyISAM.

If a table check indicates that an InnoDB table has problems, you should be able to restore the table to a consistent state by dumping it with mysqldump, dropping it, and re-creating it from the dump file:

shell> mysqldump db_name table_name > dump_file
shell> mysql db_name < dump_file

In the event of a crash, some InnoDB tables might need repairs. Normally, it suffices simply to restart the server because InnoDB performs auto-recovery. In rare cases, the server might not start up due to failure of InnoDB auto-recovery. If that happens, use the following procedure:

  1. Restart the server with the —innodb_force_recovery option set to a value in the range from 1 to 6. These values indicate increasing levels of caution in avoiding a crash, and increasing levels of tolerance for possible inconsistency in the recovered tables. A good value to start with is 4.
  2. When you start the server with —innodb_force_recovery set to non-zero value, InnoDB treats the tablespace as read-only. Consequently, you should dump the InnoDB tables with mysqldump and then drop them while the option is in effect. Then restart the server without the —innodb_force_recovery option. When the server comes up, recover the InnoDB tables from the dump files.
  3. If the preceding steps fail, it is necessary to restore the InnoDB tables from a previous backup.

How can we enable MyISAM Auto-Repair?

MySQL server can be instructed to check and repair MyISAM tables automatically. With automatic repair enabled, the server checks each MyISAM table when it opens it to see whether the table was closed properly that last time it was used, and is not marked as needing repair. If the table is not okay, the server repairs it.

To enable automatic MyISAM table maintenance, start the server with the —myisam-recover option. The option value can consist of a comma-separated list of one or more of the following values:

  1. DEFAULT for the default checking
  2. BACKUP tells the server to make a backup of any table that it must change.
  3. FORCE causes table recovery to be performed even if it would cause the loss of more than one row of data
  4. QUICK performs quick recovery: Tables that have no holes are skipped.

Is it possible to use the ALTER TABLE statement without locking the table?

One approach is to create a new table, and then copy the contents of the old table over to the new table a chunk at a time while always being cautious of any INSERT/UPDATE/DELETE on the source table. This can be managed by a trigger, and would cause a slow down, it's not a lock. Once finished, change the name of the source table, then change the name of the new table preferably in a transaction. Once finished, recompile any stored procedures, etc that use that table.

Percona makes a tool called pt-online-schema-change that allows this to be done. It essentially makes a copy of the table and modifies the new table. To keep the new table in sync with the original it uses triggers to update. This allows the original table to be accessed while the new table is prepared in the background.

Facebook made a similar tool: https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932

MySQL now offer a solution, a feature that improves the performance, concurrency, and availability of InnoDB tables during DDL (primarily ALTER TABLE) operations. See Section 14.11, “InnoDB and Online DDL” for details. The details vary according to the type of operation. In some cases, the table can be modified concurrently while the ALTER TABLE is in progress. The operation might be able to be performed without doing a table copy, or using a specially optimized type of table copy. Space usage is controlled by the innodb_online_alter_log_max_size configuration option.

It lets you adjust the balance between performance and concurrency during the DDL operation, by choosing whether to block access to the table entirely (LOCK=EXCLUSIVE clause), allow queries but not DML (LOCK=SHARED clause), or allow full query and DML access to the table (LOCK=NONE clause). When you omit the LOCK clause or specify LOCK=DEFAULT, MySQL allows as much concurrency as possible depending on the type of operation.

In MySQL the operation will copy data to a new table while blocking transactions, which has been main pain for MySQL DBAs prior to v. 5.6. The good news is that since MySQL 5.6 release the restriction has been mostly lifted and you now can enjoy the true power of the MYSQL DB.

Performing changes in-place where possible, rather than creating a new copy of the table, avoids temporary increases in disk space usage and I/O overhead associated with copying the table and reconstructing secondary indexes.

Alternatively you can use ALTER ONLINE TABLE to ensure that your ALTER TABLE does not block concurrent operations (takes no locks). It is equivalent to LOCK=NONE.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License