EXPLAIN EXTENDED

How to create fast database queries

Late row lookups: InnoDB

with 7 comments

Answering questions asked on the site.

Aryé asks:

Thanks for your article about late row lookups in MySQL.

I have two questions for you please:

  • Is this workaround specific to MyISAM engine?
  • How does PostgreSQL handle this?

The questions concerns a certain workaround for MySQL LIMIT … OFFSET queries like this:

SELECT  *
FROM    mytable
ORDER BY
        id
LIMIT   10
OFFSET  10000

which can be improved using a little rewrite:

SELECT  m.*
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT   10
        OFFSET  10000
        ) q
JOIN    mytable m
ON      m.id = q.id
ORDER BY
        m.id

For the rationale behind this improvement, please read the original article.

Now, to the questions.

The second questions is easy: PostgreSQL won't pull the fields from the table until it really needs them. If a query involving an ORDER BY along with LIMIT and OFFSET is optimized to use the index for the ORDER BY part, the table lookups won't happen for the records skipped.

Though PostgreSQL does not reflect the table lookups in the EXPLAIN output, a simple test would show us that they are done only LIMIT times, not OFFSET + LIMIT (like MySQL does).

Now, let's try to answer the first question: will this trick improve the queries against an InnoDB table?

To do this, we will create a sample table:

Table creation details

There is one indexed INT column and two TEXT columns, shorttxt storing short strings (1 to 100 characters), longtxt storing long strings (8193 to 8293 characters).

Let's run some queries against this table.

PRIMARY KEY and the INT column

No rewrite:

SELECT  value
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000

Query results

Rewrite:

Query results

As you can see, there is almost no difference (41 ms vs 40 ms).

InnoDB tables are clustered on their PRIMARY KEY columns, which means the the index on id (used to serve the ORDER BY condition) contains all the data the query needs. There is a negligible benefit from not lookup up the value columns at the index pages because the column is tiny and the index pages need to be read anyway.

PRIMARY KEY and the short TEXT column

Rewrite:

SELECT  LENGTH(l.shorttxt)
FROM    (
        SELECT  id
        FROM    lookup
        ORDER BY
                id
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.id

Query results

No rewrite:

SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000

Query results

There is quite a significant difference (92 ms vs 40 ms) the reasons for which we will discuss a little bit later, after we see the results of the third query.

PRIMARY KEY and the long TEXT column

Rewrite:

SELECT  LENGTH(l.longtxt)
FROM    (
        SELECT  id
        FROM    lookup
        ORDER BY
                id
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.id

Query results

No rewrite:

SELECT  LENGTH(longtxt)
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000

Query results

The query employing the trick runs for same 40 ms, while the straightforward query takes as much as 30 seconds (30,359 ms, to be exact).

Why such a difference?

The reason is that InnoDB, despite the fact it stores the data in the clustered index, is still able to move some data out of the index. This is called external storage.

With COMPACT row format I used to create the tables, InnoDB, when trying to insert a record with two TEXT columns on the page, will try to fit both of them on the page. If this is not possible, then it will split the longest of the records in two parts: the first 768 bytes will be stored on the page and the remaining data will be stored on a separate page (or pages), with a pointer to these data stored in the original clustered index. This will be repeated until all TEXT columns fit on the page of there is no more space there (in which case an error would be thrown).

This means that all TEXT columns shorter than 768 bytes will be stored completely on the page, while those longer can be either split of stored as a whole (with at least first 768 bytes still being on the page).

With the column lengths chosen (1 to 100 and 8K to 8K + 100, accordingly), it can be easily seen that shorttxt will always be stored on-page, while longtxt will always be split (since InnoDB allows at most 8K per record (minus some overhead) to be stored on one page).

Now, this becomes more clear. As with MyISAM, the straightforward query involving longtxt should perform two page lookups per each record scanned: the first one on the clustered index, the second one on the external storage. This takes lots of time and may even spoil the InnoDB cache with unneeded data (which would lead to increased cache miss ratio).

The query on shorttxt is not that bad, since it only requires come extra CPU cycles per record to calculate the string length.

Now, let's check one more query which orders by a secondary indexed field rather than id:

Secondary index and the short text column

Rewrite:

SELECT  LENGTH(l.shorttxt)
FROM    (
        SELECT  id, value
        FROM    lookup
        ORDER BY
                value
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.value

Query results

No rewrite:

SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
        value
LIMIT   10
OFFSET  90000

Query results

The execution times for these queries vary tenfold: 26 ms vs 266 ms.

As with MyISAM, the secondary index requires an extra lookup to retrieve the actual values from the table (the only difference is that any index is secondary in MyISAM, including that used to police the PRIMARY KEY).

The first query does not perform these row lookups on the skipped records and hence is ten times as fast. It is even faster than queries ordering on the PRIMARY KEY, because the secondary index contains significantly less data than the PRIMARY KEY, holds much more records per page, is hence more shallow and can be traversed faster.

The second query does perform the early row lookups, as usual.

Conclusion

A trick used to avoid early row lookups for the LIMIT … OFFSET queries is useful on InnoDB tables too, though to different extent, depending on the ORDER BY condition and the columns involved:

  • It's very useful on queries involving columns stored off-page (long TEXT, BLOB and VARCHAR columns)
  • It's very useful on ORDER BY conditions served by secondary indexes
  • It's quite useful on moderate sized columns (still stored on page) or CPU-intensive expressions
  • It's almost useless on short columns without complex CPU-intensive processing

Hope that helps.


I'm always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

February 11th, 2011 at 11:00 pm

Posted in MySQL

7 Responses to 'Late row lookups: InnoDB'

Subscribe to comments with RSS or TrackBack to 'Late row lookups: InnoDB'.

  1. first all,thanks for your two articles.I have a question about early row lookup:In the book ,it said mysql using early row lookup(single pass) in sort avoids reading the rows from the table twice.But if using index for sort why mysql still use early row lookup?It is obvious late row lookup better,before read this book,I consider mysql use late row lookup.

    howard

    9 Mar 11 at 07:15

  2. @howard: I don’t develop MySQL. Probably it would better to ask its developers why doesn’t it implement late row lookups.

    Quassnoi

    9 Mar 11 at 13:35

  3. your articles are full of info. keep going man. thank you very much.

    it’s generally advised to avoid joins in mysql. i always assumed that. i’m surprised to learn i shouldn’t! do you have any guidelines to tell about joins to decide when to use or not?

    speaking low level, we are essentially joining b-tree index and full table. is this true or we really are joining two tables?

    one more question, do you think this method is suitable for a moderate-to-high trafficked site? continously joining can be expensive over time?

    Ken

    16 Mar 11 at 21:57

  4. @Ken:

    You should use joins when you need to, that’s what relational databases are for.

    Speaking low-level, a B-Tree index is a table with records organized in a special way. So yes, we are joining two tables (both of them organized as a B-Tree, since it’s InnoDB).

    This method is more suitable for low, moderate and high trafficked site than using pure LIMIT. If your site can handle LIMIT, it can handle this too with greater ease.

    Any index lookup involves a join between the index and the table, though it’s may not be as obvious.

    Quassnoi

    16 Mar 11 at 22:04

  5. Would you say that this behaviour is the same as report in this bug report?

    http://bugs.mysql.com/bug.php?id=60281

    Slight variation, but seems to be the same cause….

    Steve

    7 Feb 12 at 21:58

  6. @Steve: not exactly but similar (the bug deals with MyISAM while this article is about InnoDB). MyISAM issue is covered in my earlier article: http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/

    Quassnoi

    7 Feb 12 at 22:36

  7. Yes I noticed the previous article. However, I have a feeling that they both have the same root cause, which is why I draw the similarity with that bug. I think the optimizer / something in the mysql layer is to blame in both cases here. The storage engines are simly doing as instructed by the MySQL layer… I think the workaround you have presented serves to illustrate the fact that the storage engines can indeed be used in a more efficient manner, thus my blame of the upper layers.

    Steve

    8 Feb 12 at 00:00

Leave a Reply