Late row lookups: InnoDB
Answering questions asked on the site.
Aryé asks:
Thanks for your article about late row lookups in MySQL.
I have two questions for you please:
- Is this workaround specific to MyISAM engine?
- How does PostgreSQL handle this?
The questions concerns a certain workaround for MySQL LIMIT … OFFSET
queries like this:
SELECT * FROM mytable ORDER BY id LIMIT 10 OFFSET 10000
which can be improved using a little rewrite:
SELECT m.* FROM ( SELECT id FROM mytable ORDER BY id LIMIT 10 OFFSET 10000 ) q JOIN mytable m ON m.id = q.id ORDER BY m.id
For the rationale behind this improvement, please read the original article.
Now, to the questions.
The second questions is easy: PostgreSQL won't pull the fields from the table until it really needs them. If a query involving an ORDER BY
along with LIMIT
and OFFSET
is optimized to use the index for the ORDER BY
part, the table lookups won't happen for the records skipped.
Though PostgreSQL does not reflect the table lookups in the EXPLAIN
output, a simple test would show us that they are done only LIMIT
times, not OFFSET + LIMIT
(like MySQL does).
Now, let's try to answer the first question: will this trick improve the queries against an InnoDB table?
To do this, we will create a sample table:
There is one indexed INT
column and two TEXT
columns, shorttxt
storing short strings (1 to 100 characters), longtxt
storing long strings (8193 to 8293 characters).
Let's run some queries against this table.
PRIMARY KEY and the INT column
No rewrite:
SELECT value FROM lookup ORDER BY id LIMIT 10 OFFSET 90000
Rewrite:
As you can see, there is almost no difference (41 ms vs 40 ms).
InnoDB tables are clustered on their PRIMARY KEY
columns, which means the the index on id
(used to serve the ORDER BY
condition) contains all the data the query needs. There is a negligible benefit from not lookup up the value
columns at the index pages because the column is tiny and the index pages need to be read anyway.
PRIMARY KEY and the short TEXT column
Rewrite:
SELECT LENGTH(l.shorttxt) FROM ( SELECT id FROM lookup ORDER BY id LIMIT 10 OFFSET 90000 ) q JOIN lookup l ON l.id = q.id ORDER BY q.id
No rewrite:
SELECT LENGTH(shorttxt) FROM lookup ORDER BY id LIMIT 10 OFFSET 90000
There is quite a significant difference (92 ms vs 40 ms) the reasons for which we will discuss a little bit later, after we see the results of the third query.
PRIMARY KEY and the long TEXT column
Rewrite:
SELECT LENGTH(l.longtxt) FROM ( SELECT id FROM lookup ORDER BY id LIMIT 10 OFFSET 90000 ) q JOIN lookup l ON l.id = q.id ORDER BY q.id
No rewrite:
SELECT LENGTH(longtxt) FROM lookup ORDER BY id LIMIT 10 OFFSET 90000
The query employing the trick runs for same 40 ms, while the straightforward query takes as much as 30 seconds (30,359 ms, to be exact).
Why such a difference?
The reason is that InnoDB
, despite the fact it stores the data in the clustered index, is still able to move some data out of the index. This is called external storage
.
With COMPACT
row format I used to create the tables, InnoDB
, when trying to insert a record with two TEXT
columns on the page, will try to fit both of them on the page. If this is not possible, then it will split the longest of the records in two parts: the first 768 bytes will be stored on the page and the remaining data will be stored on a separate page (or pages), with a pointer to these data stored in the original clustered index. This will be repeated until all TEXT
columns fit on the page of there is no more space there (in which case an error would be thrown).
This means that all TEXT
columns shorter than 768 bytes will be stored completely on the page, while those longer can be either split of stored as a whole (with at least first 768 bytes still being on the page).
With the column lengths chosen (1 to 100 and 8K to 8K + 100, accordingly), it can be easily seen that shorttxt
will always be stored on-page, while longtxt
will always be split (since InnoDB
allows at most 8K
per record (minus some overhead) to be stored on one page).
Now, this becomes more clear. As with MyISAM, the straightforward query involving longtxt
should perform two page lookups per each record scanned: the first one on the clustered index, the second one on the external storage. This takes lots of time and may even spoil the InnoDB cache with unneeded data (which would lead to increased cache miss ratio).
The query on shorttxt
is not that bad, since it only requires come extra CPU cycles per record to calculate the string length.
Now, let's check one more query which orders by a secondary indexed field rather than id
:
Secondary index and the short text column
Rewrite:
SELECT LENGTH(l.shorttxt) FROM ( SELECT id, value FROM lookup ORDER BY value LIMIT 10 OFFSET 90000 ) q JOIN lookup l ON l.id = q.id ORDER BY q.value
No rewrite:
SELECT LENGTH(shorttxt) FROM lookup ORDER BY value LIMIT 10 OFFSET 90000
The execution times for these queries vary tenfold: 26 ms vs 266 ms.
As with MyISAM, the secondary index requires an extra lookup to retrieve the actual values from the table (the only difference is that any index is secondary in MyISAM, including that used to police the PRIMARY KEY
).
The first query does not perform these row lookups on the skipped records and hence is ten times as fast. It is even faster than queries ordering on the PRIMARY KEY
, because the secondary index contains significantly less data than the PRIMARY KEY
, holds much more records per page, is hence more shallow and can be traversed faster.
The second query does perform the early row lookups, as usual.
Conclusion
A trick used to avoid early row lookups for the LIMIT … OFFSET
queries is useful on InnoDB tables too, though to different extent, depending on the ORDER BY
condition and the columns involved:
- It's very useful on queries involving columns stored off-page (long
TEXT
,BLOB
andVARCHAR
columns) - It's very useful on
ORDER BY
conditions served by secondary indexes - It's quite useful on moderate sized columns (still stored on page) or CPU-intensive expressions
- It's almost useless on short columns without complex CPU-intensive processing
Hope that helps.
I'm always glad to answer the questions regarding database queries.
first all,thanks for your two articles.I have a question about early row lookup:In the book ,it said mysql using early row lookup(single pass) in sort avoids reading the rows from the table twice.But if using index for sort why mysql still use early row lookup?It is obvious late row lookup better,before read this book,I consider mysql use late row lookup.
howard
9 Mar 11 at 07:15
@howard: I don’t develop MySQL. Probably it would better to ask its developers why doesn’t it implement late row lookups.
Quassnoi
9 Mar 11 at 13:35
your articles are full of info. keep going man. thank you very much.
it’s generally advised to avoid joins in mysql. i always assumed that. i’m surprised to learn i shouldn’t! do you have any guidelines to tell about joins to decide when to use or not?
speaking low level, we are essentially joining b-tree index and full table. is this true or we really are joining two tables?
one more question, do you think this method is suitable for a moderate-to-high trafficked site? continously joining can be expensive over time?
Ken
16 Mar 11 at 21:57
@Ken:
You should use joins when you need to, that’s what relational databases are for.
Speaking low-level, a B-Tree index is a table with records organized in a special way. So yes, we are joining two tables (both of them organized as a B-Tree, since it’s InnoDB).
This method is more suitable for low, moderate and high trafficked site than using pure
LIMIT
. If your site can handleLIMIT
, it can handle this too with greater ease.Any index lookup involves a join between the index and the table, though it’s may not be as obvious.
Quassnoi
16 Mar 11 at 22:04
Would you say that this behaviour is the same as report in this bug report?
http://bugs.mysql.com/bug.php?id=60281
Slight variation, but seems to be the same cause….
Steve
7 Feb 12 at 21:58
@Steve: not exactly but similar (the bug deals with MyISAM while this article is about InnoDB). MyISAM issue is covered in my earlier article: https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
Quassnoi
7 Feb 12 at 22:36
Yes I noticed the previous article. However, I have a feeling that they both have the same root cause, which is why I draw the similarity with that bug. I think the optimizer / something in the mysql layer is to blame in both cases here. The storage engines are simly doing as instructed by the MySQL layer… I think the workaround you have presented serves to illustrate the fact that the storage engines can indeed be used in a more efficient manner, thus my blame of the upper layers.
Steve
8 Feb 12 at 00:00
so, innodb still use early row lookup?
huawei
17 Dec 18 at 05:49