Choosing column order
From Stack Overflow:
I know you can
ALTER
the column order in MySQL withFIRST
andAFTER
, but why would you want to bother?Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?
Well, there is reason.
MySQL stores some types (out of which VARCHAR
is probably the most important) with variable length.
That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.
Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.
If we need the value of the first column (first non-PRIMARY KEY
column, of course), the engine will do it once per row.
But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.
Let's create a sample table and see how big the impact is:
CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory; CREATE TABLE t_column ( id INT NOT NULL PRIMARY KEY, col1 VARCHAR(10), col2 VARCHAR(10), col3 VARCHAR(10), col4 VARCHAR(10), col5 VARCHAR(10), col6 VARCHAR(10), col7 VARCHAR(10), col8 VARCHAR(10), col9 VARCHAR(10), col10 VARCHAR(10), col11 VARCHAR(10), col12 VARCHAR(10), col13 VARCHAR(10), col14 VARCHAR(10), col15 VARCHAR(10), col16 VARCHAR(10), col17 VARCHAR(10), col18 VARCHAR(10), col19 VARCHAR(10), col20 VARCHAR(10) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_filler(1000000); COMMIT; INSERT INTO t_column SELECT id, LPAD('', RAND(20090521) * 10, 'a'), LPAD('', RAND(20090521 << 1) * 10, 'a'), LPAD('', RAND(20090521 << 2) * 10, 'a'), LPAD('', RAND(20090521 << 3) * 10, 'a'), LPAD('', RAND(20090521 << 4) * 10, 'a'), LPAD('', RAND(20090521 << 5) * 10, 'a'), LPAD('', RAND(20090521 << 6) * 10, 'a'), LPAD('', RAND(20090521 << 7) * 10, 'a'), LPAD('', RAND(20090521 << 8) * 10, 'a'), LPAD('', RAND(20090521 << 9) * 10, 'a'), LPAD('', RAND(20090521 << 10) * 10, 'a'), LPAD('', RAND(20090521 << 11) * 10, 'a'), LPAD('', RAND(20090521 << 12) * 10, 'a'), LPAD('', RAND(20090521 << 13) * 10, 'a'), LPAD('', RAND(20090521 << 14) * 10, 'a'), LPAD('', RAND(20090521 << 15) * 10, 'a'), LPAD('', RAND(20090521 << 16) * 10, 'a'), LPAD('', RAND(20090521 << 17) * 10, 'a'), LPAD('', RAND(20090521 << 18) * 10, 'a'), LPAD('', RAND(20090521 << 19) * 10, 'a') FROM filler;
We have a table with an INTEGER
id
and 20 nullable VARCHAR(10)
columns, filled with strings of random length.
Now, let's try to find the sum of lengths of the first column:
SELECT SUM(LENGTH(col1)) FROM t_column
SUM(LENGTH(col1)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5005271 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (1.0937s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | t_column | ALL | 997594 | 100.00 |
This query takes 1.09 seconds.
Same on the 10'th column:
SELECT SUM(LENGTH(col10)) FROM t_column
SUM(LENGTH(col10)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5002918 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (1.2031s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | t_column | ALL | 997594 | 100.00 |
Now, it 1.20 seconds, or 10% more.
And, finally, same query for the 20'th column:
SELECT SUM(LENGTH(col20)) FROM t_column
SUM(LENGTH(col20)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5006319 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (1.2812s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | t_column | ALL | 997594 | 100.00 |
It's 1.28 seconds, or 17% more than for the first column.
Summary:
In case of many variable length columns on the table, the column order may affect the performance of the queries.
The less close a column is to the beginning of the row, the more preceding columns InnoDB engine should examine to find out the offset of a given one.
Columns that are closer to the beginning of the table are selected faster.
Further experimenting (and probably newer machine, version (5.6.12), etc):
col1 takes 0.34s; any other col takes 0.36s (6% slower). From that, I conclude that the first column is special cased.
We really need someone to look at the code and see what differences are really likely.
Rick James
9 Jun 15 at 21:38