EXPLAIN EXTENDED

How to create fast database queries

Emulating SKIP SCAN

Comments enabled. I *really* need your comment

In the previous article I wrote about ranged conditions (<, >, <=, >= or BETWEEN) on one field combined with ORDER BY another field.

We have to choose an index either for filtering or for sorting, as a single index cannot be used for both purposes.

Well, actually, it can.

There is a special access method called SKIP SCAN, designed just for these cases. This method can be used when a filtering condition applied to a secondary column of a composite index. Like, we have and index on (orderer, ranger) and search for ranger < 10.

This method scans the index bottom to top, selecting each possible orderer values. For each orderer, it applies the filtering condition on ranger, thus obtaining a valid range condition on the index that can be iterated. As soon and this range is iterated, the method defines the range for the next orderer, skipping all other values (hence the name).

SKIP SCAN does not perform as well as a simple index scan, but a query can benefit from it is there are few orderer's in the table.

Oracle supports this method directly, but MySQL, of course, not. SKIP SCAN, though, can be easily emulated.

Let's create the sample tables:

Table creation details

, and try to select the rows having ranger < 10, ordered by orderer.

For the sake of brevity, we'll just select 10 rows starting from 20,000th.

We can use the index either for ordering or for filtering. Let's see what is faster.

This query uses the index suitable for filtering:

SELECT  *
FROM    t_indexer i  FORCE INDEX (ix_indexer_ranger)
WHERE   i.ranger < 10
ORDER BY
        orderer
LIMIT   20000, 10
id ranger orderer
151140 9 9
151564 9 9
151838 9 9
152852 9 9
152956 9 9
154557 9 9
156156 9 9
156758 9 9
157391 9 9
160167 9 9
10 rows fetched in 0.0003s (0.0819s)

This query uses the index suitable for ordering:

SELECT  *
FROM    t_indexer i  FORCE INDEX (ix_indexer_orderer_ranger)
WHERE   i.ranger < 10
ORDER BY
        orderer
LIMIT   20000, 10
id ranger orderer
151140 9 9
151564 9 9
151838 9 9
152852 9 9
152956 9 9
154557 9 9
156156 9 9
156758 9 9
157391 9 9
160167 9 9
10 rows fetched in 0.0004s (0.1117s)

As you can see, the former query (with the ranging index) is slightly more performant than the latter one (with the ordering index).

There are only 10 distinct orderer's in our table, so we can try to emulate SKIP SCAN.

We do it in the following steps:

  1. Retrieve the sorted set of orderer's. This can be done using the index.
  2. For each orderer, retrieve the sorted set of ranger's, applying the filtering condition. This also can be done using the index.

Here's the query:

SELECT  i.*
FROM    (
        SELECT  DISTINCT orderer
        FROM    t_indexer
        ORDER BY
                orderer
        ) io
STRAIGHT_JOIN
        t_indexer i
WHERE   i.orderer <= io.orderer
        AND i.orderer >= io.orderer
        AND i.ranger < 10
LIMIT   20000, 10
id ranger orderer
151140 9 9
151564 9 9
151838 9 9
152852 9 9
152956 9 9
154557 9 9
156156 9 9
156758 9 9
157391 9 9
160167 9 9
10 rows fetched in 0.0003s (0.0200s)

5 times faster.

Note the usage of i.orderer <= io.orderer AND i.orderer >= io.orderer instead of mere i.orderer = io.orderer.

Using the former construction forces MySQL to use range access method on every NESTED LOOP iteration:

EXPLAIN EXTENDED
SELECT  i.*
FROM    (
        SELECT  DISTINCT orderer
        FROM    t_indexer
        ORDER BY
                orderer
        ) io
STRAIGHT_JOIN
        t_indexer i
WHERE   i.orderer <= io.orderer
        AND i.orderer >= io.orderer
        AND i.ranger < 10
LIMIT   20000, 10
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10 100.00
1 PRIMARY i ALL ix_indexer_orderer_ranger 200604 18.01 Range checked for each record (index map: 0x2)
2 DERIVED t_indexer range ix_indexer_orderer_ranger 4 9 100.00 Using index for group-by
3 rows fetched in 0.0004s (0.0030s)

If we had put simple equatily condition i.orderer = io.orderer, ref access method on orderer would be chosen instead of range access on (orderer, ranger), thus using inefficient where checking instead of just liming rows.

Written by Quassnoi

April 2nd, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply