EXPLAIN EXTENDED

How to create fast database queries

MySQL: LIMIT on LIMIT

Comments enabled. I *really* need your comment

From Stack Overflow:

Here's my situation: I want to select first ten entries from a database with id = $id.

But I want the results to be listed in a certain priority: if criteria = $criteria, then I want those results displayed first.

Otherwise, I just want to keep displaying the rest of the rows.

This can be done with a very simple query:

SELECT  *
FROM    t_preference
ORDER BY
        category = 1 DESC, id DESC
LIMIT 10

In this query, category = 1 is a boolean expression which returns TRUE when category is equal to 1, FALSE otherwise.

Since TRUE is 1 and FALSE is 0, TRUE is greater than FALSE from ORDER BY's point of view. We should apply DESC for the TRUE results (i. e. rows with category = 1) to be returned first.

However, this query is not very index friendly, and it will need to select all rows and sort them.

To improve performance, we should split this query into two index-friendly queries and merge them with a UNION.

Let's create a sample table and see how do we do it:

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_preference (
        id INT NOT NULL PRIMARY KEY,
        category INT NOT NULL,
        value VARCHAR(300) NOT NULL,
        KEY ix_preference_category (category)
) ENGINE=InnoDB;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(500000);
COMMIT;

INSERT
INTO    t_preference (id, category, value)
SELECT  id, FLOOR(RAND(20090805) * 100000), RPAD(CONCAT('Preference ', id), 300, ' ')
FROM    filler;
&#91;/sourcecode&#93;

The table contains <strong>500,000</strong> rows with <strong>100,000</strong> random categories (<strong>5</strong> rows per <code>category</code> in average).

The original query:


SELECT  *
FROM    t_preference
ORDER BY
        category = 1 DESC, id DESC
LIMIT 10

id category value
500000 83542 Preference 500000
299866 83542 Preference 299866
190943 83542 Preference 190943
20733 83542 Preference 20733
499999 19569 Preference 499999
499998 71435 Preference 499998
499997 45869 Preference 499997
499996 19303 Preference 499996
499995 50216 Preference 499995
499994 43926 Preference 499994
10 rows fetched in 0.0006s (1.0162s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t_preference ALL 500047 100.00 Using filesort

runs for more than 1 second, quite inefficient. From the plan we can see that it uses a filesort on the whole table.

To return first 10 rows ordered as requested above, we should do the following:

  1. Select first 10 rows with category = $category, ordered by id DESC. This can be done efficiently using an index on category.
  2. Select first 10 rows from the whole table, ordered by id DESC. This can be done efficiently using a PRIMARY KEY (on which any InnoDB table is clustered).
  3. UNION these resultsets. We need to use UNION here, not UNION ALL, since the resultsets can intersect.
  4. Order the combined resultset on category = $category DESC, id DESC

Note than the index on category is in fact is an index on (category, id), since InnoDB tables are clustered and id (which is a PRIMARY KEY in our table) serves as a row pointer and hence is implicitly included into any index leaf.

If id were not a PRIMARY KEY, or the table were MyISAM, we would need to create a composite index on (category, id) for the step 1 to be efficient.

Let's make the query in steps.

First, let's select top 10 records with category = 83542:

SELECT  *
FROM    t_preference
WHERE   category = 83542
ORDER BY
        id DESC
LIMIT 10

id category value
500000 83542 Preference 500000
299866 83542 Preference 299866
190943 83542 Preference 190943
20733 83542 Preference 20733
4 rows fetched in 0.0002s (0.0023s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t_preference ref ix_preference_category ix_preference_category 4 const 4 100.00 Using where

This query uses the index and is instant.

Now, let's select the top 10 records from the whole table:

SELECT  *
FROM    t_preference
ORDER BY
        id DESC
LIMIT 10

id category value
500000 83542 Preference 500000
499999 19569 Preference 499999
499998 71435 Preference 499998
499997 45869 Preference 499997
499996 19303 Preference 499996
499995 50216 Preference 499995
499994 43926 Preference 499994
499993 23138 Preference 499993
499992 90588 Preference 499992
499991 76601 Preference 499991
10 rows fetched in 0.0006s (0.0022s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t_preference index PRIMARY 4 10 5000470.00

This query is instant too.

Now, let's merge the resultsets. Note that row 500,000 is contained in both of them.

SELECT  *
FROM    (
        SELECT  *
        FROM    t_preference
        WHERE   category = 83542
        ORDER BY
                id DESC
        LIMIT 10
        ) q1
UNION
SELECT  *
FROM    (
        SELECT  *
        FROM    t_preference
        ORDER BY
                id DESC
        LIMIT 10
        ) q2

id category value
500000 83542 Preference 500000
299866 83542 Preference 299866
190943 83542 Preference 190943
20733 83542 Preference 20733
499999 19569 Preference 499999
499998 71435 Preference 499998
499997 45869 Preference 499997
499996 19303 Preference 499996
499995 50216 Preference 499995
499994 43926 Preference 499994
499993 23138 Preference 499993
499992 90588 Preference 499992
499991 76601 Preference 499991
13 rows fetched in 0.0007s (0.0032s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 4 100.00
2 DERIVED t_preference ref ix_preference_category ix_preference_category 4 4 100.00 Using where
3 UNION <derived4> ALL 10 100.00
4 DERIVED t_preference index PRIMARY 4 10 5000470.00
UNION RESULT <union1,3> ALL

Since row 500,000 is a duplicate, this resultset returns 13 rows, not 14.

Finally, we should just ORDER and LIMIT the resultset:

SELECT  *
FROM    (
        SELECT  *
        FROM    t_preference
        WHERE   category = 83542
        ORDER BY
                id DESC
        LIMIT 10
        ) q1
UNION
SELECT  *
FROM    (
        SELECT  *
        FROM    t_preference
        ORDER BY
                id DESC
        LIMIT 10
        ) q2
ORDER BY
        category = 83542 DESC, id DESC
LIMIT 10
id category value
500000 83542 Preference 500000
299866 83542 Preference 299866
190943 83542 Preference 190943
20733 83542 Preference 20733
499999 19569 Preference 499999
499998 71435 Preference 499998
499997 45869 Preference 499997
499996 19303 Preference 499996
499995 50216 Preference 499995
499994 43926 Preference 499994
10 rows fetched in 0.0006s (0.0030s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 4 100.00
2 DERIVED t_preference ref ix_preference_category ix_preference_category 4 4 100.00 Using where
3 UNION <derived4> ALL 10 100.00
4 DERIVED t_preference index PRIMARY 4 10 5000470.00
UNION RESULT <union1,3> ALL Using filesort

This query uses filesort, but in this case at most 20 rows will be sorted, so it's not a problem.

The query takes but 3 ms, next to instant.

Written by Quassnoi

August 5th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply