Yesterday I wrote an article on how to emulate analytiс function
ROW_NUMBER() that is present in SQL Server and Oracle, but absent in MySQL.
Today, we will try to optimize this query.
The original query scans all table, filtering out rows when necessary. This, as has already been said, if efficient if the number of rows is comparable to the number of
But in real tables, the number of
grouper's tends to be much less than the number of rows, leaving us some place for optimization.
The main drawback of the query above is that it always scans the whole table, while in fact it needs just a few items for each
What the query really needs to do, is:
- Select all
- For each
grouper, select 15 first rows, that is rows with minimal
- Return the rows to the client
This could be easily achieved in any procedural language using simple conditional loops. But in MySQL, we cannot just loop a table in an
SQL query with a counter. We need to supply the join conditions first. And, unfortunately for us, the
NESTED LOOP count cannot be used as a join condition. We don't even have access to this variable from within the query.
We, though, may try to outsmart the database and fetch the join conditions we need.
What we really need here, is the 1st and the 15th
id's for each
As we try to select rows sorted by
id for each
grouper here, we really need all rows that satisfy this condition:
(grouper, id[grouper]) <= (row.grouper, row.id) <= (row.grouper, id[grouper])
id canbe easily selected as:
SELECT id FROM t_limiter li WHERE li.grouper = @grouper ORDER BY li.grouper, li.id LIMIT 14, 1
This will work instantly, as it will utilize the index on
(grouper, id) which we created along with the table.
Now, what if we don't have the 15's value in the table? Like in sample table, for which I deliberatly created only 1 record for
In this case the query will return no rows. Empty recordset, beign returned as result a correlated subquery, is treated like a
NULL, which we can easily coalesce to
MAXVALUE (that is,
2^32 - 1).
Summarizing all said above, we will write the following query:
SELECT l.* FROM ( SELECT grouper, COALESCE( ( SELECT id FROM t_limiter li WHERE li.grouper = dlo.grouper ORDER BY li.grouper, li.id LIMIT 14, 1 ), CAST(0xFFFFFFFF AS DECIMAL)) AS mid FROM ( SELECT DISTINCT grouper FROM t_limiter dl ) dlo ) lo, t_limiter l WHERE l.grouper = lo.grouper AND l.id <= lo.mid
|481 rows fetched in 0,0047s (3,2374s)|
This query works for… hmm, 3,2374 seconds, which is hardly an improvement compared to 3,4241 seconds from our previous query.
Let's look into the plan:
|4||DERIVED||dl||range||ix_limiter_grouper_id||4||225||100.00||Using index for group-by|
|3||DEPENDENT SUBQUERY||li||ref||ix_limiter_grouper_id||ix_limiter_grouper_id||4||dlo.grouper||4495||100.00||Using where; Using index|
As you can see, the plan for some reason doesn't use
range access method for the very outer instance on
t_limiter (aliased as
l in our query). Instead, it tries to scan the portions of index with a given
grouper and filter out the matching
id's using where.
This kills all performance benefit from using indexes, as it's exactly same thing we did in our previous, inefficient query.
Can we somehow convince MySQL to use the range access method instead?
Fortunately for us, yes.
As MySQL documentation says here:
Range checked for each record (index map: N)
MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known. For each row combination in the preceding tables, MySQL checks whether it is possible to use a
index_mergeaccess method to retrieve rows. This is not very fast, but is faster than performing a join with no index at all.
All we need to outsmart MySQL here is to add some uncertainty into the join conditions, so that the
index will not be immediately eligible for
SELECT l.* FROM ( SELECT grouper, COALESCE( ( SELECT id FROM t_limiter li WHERE li.grouper = dlo.grouper ORDER BY li.grouper, li.id LIMIT 14, 1 ), CAST(0xFFFFFFFF AS DECIMAL)) AS mid FROM ( SELECT DISTINCT grouper FROM t_limiter dl ) dlo ) lo, t_limiter l WHERE l.grouper >= lo.grouper AND l.grouper <= lo.grouper AND l.id <= lo.mid
Note the new join conditions:
l.grouper >= lo.grouper AND l.grouper <= lo.grouper. This condition of course is equivalent to
l.grouper = lo.grouper, but MySQL now delays selection of access method till the query is run:
|1||PRIMARY||l||ALL||PRIMARY,ix_limiter_grouper_id||997544||100.00||Range checked for each record (index map: 0x3)|
|4||DERIVED||dl||range||ix_limiter_grouper_id||4||19||100.00||Using index for group-by|
|3||DEPENDENT SUBQUERY||li||ref||ix_limiter_grouper_id||ix_limiter_grouper_id||4||dlo.grouper||55419||100.00||Using where; Using index|
Index map: 0x3 in the plan above is the bitmask of indices the optimizer will choose from when it comes to selecting an access method. In our case, it will choose from both indexes present in our table:
PRIMARY ON id and
ix_limiter_grouper_id ON (grouper, id)
Let's run the query and hope that MySQL will make the right decision:
|481 rows fetched in 0,0033s (0,0072s)|