MySQL - Estimating Size

mysql

https://dev.mysql.com/doc/mysql-utilities/1.5/en/mysqldiskusage.html
https://github.com/frail/mysql-size-estimator
https://chartio.com/resources/tutorials/how-to-get-the-size-of-a-table-in-mysql/ - printed
https://www.a2hosting.com/kb/developer-corner/mysql/determining-the-size-of-mysql-databases-and-tables - printed
http://www.tocker.ca/2013/05/02/fastest-way-to-count-rows-in-a-table.html - printed
https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/
https://tecadmin.net/how-to-check-mysql-database-tables-size/# - done reading
http://stackoverflow.com/questions/9620198/how-to-get-the-sizes-of-the-tables-of-a-mysql-database
https://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/
http://www.databasejournal.com/features/mysql/calculating-mysql-database-table-and-column-sizes.html
https://forums.mysql.com/read.php?10,384443
https://www.toadworld.com/platforms/oracle/w/wiki/4891.script-to-calculate-table-average-row-size
https://www.toadworld.com/products/toad-data-modeler/f/30/t/2679

SELECT table_schema AS "Database", 
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)" 
FROM information_schema.TABLES 
GROUP BY table_schema;

Depending on how many databases you have and how large they are, this command may take a minute or two to complete. After the command finishes, it displays a list of all of your databases and their corresponding size (in megabytes).

To determine the sizes of all of the tables in a specific database:

SELECT table_name AS "Table",
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "Size (MB)"
FROM information_schema.TABLES
WHERE table_schema = "database_name"
ORDER BY (data_length + index_length) DESC;

After the command finishes, it displays a list of all of the tables and their corresponding size (in megabytes), with the largest table at the top and smallest table at the bottom.

How can we determine the size of a given schema?

SELECT table_schema "Database Name", SUM( data_length + index_length)/1024/1024
"Database Size (MB)" FROM information_schema.TABLES where table_schema = 'mydb';

The above SQL statement calculate size of single database in mysql server.

How can we determine the size for all schemas on a database server?

SELECT table_schema "Database Name", SUM(data_length+index_length)/1024/1024
"Database Size (MB)"  FROM information_schema.TABLES GROUP BY table_schema;

How can we determine the size of a given table in a given schema?

SELECT table_name "Table Name", table_rows "Rows Count", round(((data_length + index_length)/1024/1024),2)
"Table Size (MB)" FROM information_schema.TABLES WHERE table_schema = "mydb" AND table_name ="table_one";

The above SQL statement calculates size of single table in a database in mysql server. Please change ‘mydb‘ with your actual database name and ‘table_one‘ with your actual table name.

How can we determine the size for each table in a given schema?

SELECT table_name "Table Name", table_rows "Rows Count", round(((data_length + index_length)/1024/1024),2)
"Table Size (MB)" FROM information_schema.TABLES WHERE table_schema = "mydb";

The above SQL statement calculates size of all tables in a database in mysql server. Please change ‘mydb‘ with your actual database name. It will also list number of rows in each table.

How can we estimate the size of the database growth?

For example, assume that we have a table named City. This table uses the InnoDB engine. Assume that it will have 1 million rows in the next 1 year. How can we estimate the size of the database will be in the next 1 year? The structure of this table:

mysql> desc City;
+-------------+----------+------+-----+---------+----------------+
| Field       | Type     | Null | Key | Default | Extra          |
+-------------+----------+------+-----+---------+----------------+
| ID          | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name        | char(35) | NO   |     |         |                |
| CountryCode | char(3)  | NO   | MUL |         |                |
| District    | char(20) | NO   |     |         |                |
| Population  | int(11)  | NO   |     | 0       |                |
+-------------+----------+------+-----+---------+----------------+

Given the above table structure, we have:

  1. 66 bytes per row of data
  2. 4 bytes per row for the primary key
  3. 7 bytes per row for country code index
    1. 3 bytes for the country
    2. 4 bytes for clustered key attached to the country code

This results in 77 bytes of data and keys. This does not include housekeeping for BTREEs or Tablespace fragmentation. For 1 million rows, that would be 77,000,000 bytes or 73.43MB.

As for measuring the table, for a given table mydb.mytable, we can run this query:

SELECT 
    CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',SUBSTR(units,pw1*2+1,2)) DATSIZE,
    CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',SUBSTR(units,pw2*2+1,2)) NDXSIZE,
    CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',SUBSTR(units,pw3*2+1,2)) TBLSIZE
FROM
(
    SELECT DAT,NDX,TBL,IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (
        SELECT data_length DAT,index_length NDX,data_length+index_length TBL,
        FLOOR(LOG(IF(data_length=0,1,data_length))/LOG(1024)) px,
        FLOOR(LOG(IF(index_length=0,1,index_length))/LOG(1024)) py,
        FLOOR(LOG(data_length+index_length)/LOG(1024)) pz
        FROM information_schema.tables
        WHERE table_schema='mydb'
        AND table_name='mytable'
    ) AA
) A,(SELECT 'B KBMBGBTB' units) B;

To measure all tables grouped by Database and Storage Engine:

SELECT
    IF(ISNULL(DB)+ISNULL(ENGINE)=2,'Database Total',
    CONCAT(DB,' ',IFNULL(ENGINE,'Total'))) "Reported Statistic",
    LPAD(CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',
    SUBSTR(units,pw1*2+1,2)),17,' ') "Data Size",
    LPAD(CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',
    SUBSTR(units,pw2*2+1,2)),17,' ') "Index Size",
    LPAD(CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',
    SUBSTR(units,pw3*2+1,2)),17,' ') "Total Size"
FROM
(
    SELECT DB,ENGINE,DAT,NDX,TBL,
    IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM 
    (SELECT *,
        FLOOR(LOG(IF(DAT=0,1,DAT))/LOG(1024)) px,
        FLOOR(LOG(IF(NDX=0,1,NDX))/LOG(1024)) py,
        FLOOR(LOG(IF(TBL=0,1,TBL))/LOG(1024)) pz
    FROM
    (SELECT
        DB,ENGINE,
        SUM(data_length) DAT,
        SUM(index_length) NDX,
        SUM(data_length+index_length) TBL
    FROM
    (
       SELECT table_schema DB,ENGINE,data_length,index_length FROM
       information_schema.tables WHERE table_schema NOT IN
       ('information_schema','performance_schema','mysql')
       AND ENGINE IS NOT NULL
    ) AAA GROUP BY DB,ENGINE WITH ROLLUP
) AAA) AA) A,(SELECT ' BKBMBGBTB' units) B;

If you are using InnoDB tables, you can get the size for data/individual indexes from mysql.innodb_index_stats. The 'size' stat contains the answer, in pages, so you have to multiply it by the page-size, that is 16K by default.

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats where stat_name='size';

The index PRIMARY is the data itself.

If I were in your shoes, I'd build the table, populate it with a million rows of test data, and measure the change in size. Depending on your application, you might need to take the size of transaction log files into account, too.

How can we collect statistics regarding the size of our tables?

How can we use the mysqldiskusage utility?

This utility displays disk space usage for one or more databases. The utility optionally displays disk usage for the binary log, slow query log, error log, general query log, relay log, and InnoDB tablespaces. The default is to only show database disk usage.

If the command-line lists no databases, the utility shows the disk space usage for all databases. Sizes displayed without a unit indicator (such as MB) are in bytes.

The utility determines the location of the data directory by requesting it from the server. For a local server, the utility obtains size information directly from files in the data directory and InnoDB home directory. In this case, you must have file system access to read those directories. Disk space usage shown includes the sum of all storage engine- specific files such as the .MYI and .MYD files for MyISAM and the tablespace files for InnoDB.

If the file system read fails, or if the server is not local, the utility cannot determine exact file sizes. It is limited to information that can be obtained from the system tables, which therefore should be considered an estimate. For information read from the server, the account used to connect to the server must have the appropriate permissions to read any objects accessed during the operation.

If information requested requires file system access but is not available that way, the utility prints a message that the information is not accessible. This occurs, for example, if you request log usage but the server is not local and the log files cannot be examined directly.

To specify how to display output, use one of the following values with the —format option:

  • grid (default): Display output in grid or table format like that of the mysql client command-line tool.
  • csv: Display output in comma-separated values format.
  • tab: Display output in tab-separated format.
  • vertical: Display output in single-column format like that of the \G command for the mysql client command-line tool.

To turn off the headers for grid, csv, or tab display format, specify the —no-headers option.

mysqldiskusage accepts the following command-line options:

  1. all, -a: Display all disk usage. This includes usage for databases, logs, and InnoDB tablespaces.
  2. binlog, -b: Display binary log usage.
  3. empty, -m: Include empty databases.
  4. innodb, -i: Display InnoDB tablespace usage. This includes information about the shared InnoDB tablespace as well as .idb files for InnoDB tables with their own tablespace.
  5. logs, -l: Display general query log, error log, and slow query log usage.
  6. quiet, -q: Suppress informational messages.
  7. relaylog, -r: Display relay log usage.
  8. server=server: Connection information for the server.
  9. —verbose, -v: Specify how much information to display. Use this option multiple times to increase the amount of information. For example, -v = verbose, -vv = more verbose, -vvv = debug.

To connect to a server, it is necessary to specify connection parameters such as the user name, host name, password, and either a port or socket. MySQL Utilities provides a number of ways to supply this information. All of the methods require specifying your choice via a command-line option such as —server, —master, —slave, etc. The methods include the following in order of most secure to least secure.

  1. Use login-paths from your .mylogin.cnf file (encrypted, not visible). Example : login-path[:port][:socket]
  2. Use a configuration file (unencrypted, not visible) Note: available in release-1.5.0. Example : configuration-file-path[:section]
  3. Specify the data on the command-line (unencrypted, visible). Example : user[:passwd]@host[:port][:socket]
mysqldiskusage --server=root@localhost employees test // displays sizes for the employees and test databases
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License