EXPLAIN EXTENDED

How to create fast database queries

MySQL: selecting rows before and after filtered one

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a basic MySQL table, terms, comprised of an id and term field.

I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it

An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.

As we all know, MySQL doesn't support a ROW_NUMBER() or a similar function so we end up resorting to user variables and sub-selects.

I also cannot create a view with user defined variables because MySQL doesn't allow that.

MySQL indeed doesn't support ROW_NUMBER() and this function indeed can be emulated using session variables:

However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.

Let's create a sample table and see how:

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE terms (
id INT NOT NULL PRIMARY KEY,
term VARCHAR(50) NOT NULL,
KEY ix_terms_term (term)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(10000);
COMMIT;

INSERT
INTO    terms
SELECT  id,
        (
        SELECT  GROUP_CONCAT(CONCAT(
                SUBSTRING('bcdfghjklmnpqrstvwxz', FLOOR(RAND(20090725) * 20) + 1, 1), 
                SUBSTRING('aeiouy', FLOOR(RAND(20090725 << 1) * 6) + 1, 1)
                ) SEPARATOR '')
        FROM    (
                SELECT  *
                FROM    filler
                LIMIT 5
                ) q
        )
FROM    filler;

This script generates a table of 10,000 random non-words, like these:

id term
1 gecotyqy
2 wykajelo
3 hozedequ
999 fetojaju
1000 sopiqeke

To search for a given term, we need to do the following:

  1. Find the closest matching term (this is matching or a one just below the match)
  2. Find the 10 terms next to the closest (using ORDER BY term LIMIT 10)
  3. Find the 20 terms before the closest (using ORDER BY term DESC LIMIT 20)
  4. UNION ALL the results
  5. Order the results by term then id
  6. Add fields to show if a given row is a match

For instance, let's search for the word serendipity. For the sake of brevity, I'll output 5 records after and 10 records before.

SELECT  *,
IF(
(term, id) = (
SELECT  term, id
FROM    terms
WHERE   term <= 'serendipity'
                     ORDER BY
                             term DESC, id DESC
                     LIMIT 1
                     ),
        term = 'serendipity',
        NULL
        ) AS hit
FROM    (
        SELECT  *       
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) >=
(
SELECT  term, id
FROM    terms
WHERE   term <= 'serendipity'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term, id
                LIMIT 5
                ) q
        UNION ALL
        SELECT  *
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) <
                        (       
                        SELECT  term, id
                        FROM    terms
                        WHERE   term <= 'serendipity'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term DESC, id DESC
                LIMIT 10
                ) q
        ) q2
ORDER BY
        term, id

id term hit
9412 sepatobe
5538 sepesewy
8552 sepipida
7302 sepiqoke
9242 sepyreqo
3699 seqezoca
9663 seqezowa
2953 seqicylo
7177 seqowyqa
4526 seqoxawi
1108 seqozixa 0
6741 serojeme
7428 serudaqi
6830 serugaca
3683 sesejeni
15 rows fetched in 0.0005s (0.0125s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived3> ALL 15 100.00 Using filesort
3 DERIVED <derived4> ALL 5 100.00
4 DERIVED terms index ix_terms_term 152 5 196840.00 Using where; Using index
5 SUBQUERY terms range ix_terms_term ix_terms_term 152 4921 100.00 Using where; Using index
6 UNION <derived7> ALL 10 100.00
7 DERIVED terms index ix_terms_term 152 10 98420.00 Using where; Using index
8 SUBQUERY terms range ix_terms_term ix_terms_term 152 4921 100.00 Using where; Using index
UNION RESULT <union3,6> ALL
2 SUBQUERY terms range ix_terms_term ix_terms_term 152 4921 100.00 Using where; Using index

We see that there were no exact match, and the query returned us a closest match possible (marking it with a 0).

If we use the same query to search a non-word bakatora (which happens to exists in Japanese and mean stupid tiger), we'll get the following:

SELECT  *,
IF(
(term, id) = (
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'bakatora'
                     ORDER BY
                             term DESC, id DESC
                     LIMIT 1
                     ),
        term = 'bakatora',
        NULL
        ) AS hit
FROM    (
        SELECT  *       
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &gt;=
(
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'bakatora'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term, id
                LIMIT 5
                ) q
        UNION ALL
        SELECT  *
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &lt;
                        (       
                        SELECT  term, id
                        FROM    terms
                        WHERE   term &lt;= 'bakatora'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term DESC, id DESC
                LIMIT 10
                ) q
        ) q2
ORDER BY
        term, id

id term hit
3116 baholudy
8424 bahugygi
9524 bahujuwy
2242 bahyheni
3707 bajaqace
2995 bajaqyce
6116 bajeraju
5569 bajipoba
5323 bajonisi
9946 bajoquci
1993 bakatora 1
3549 bakesema
5204 bakesile
7307 bakewefe
9021 bakuwije
15 rows fetched in 0.0005s (0.0151s)

, we see an exact match.

These queries are very fast, since the only filesort they use sorts the final result on but 15 rows, and the queries complete in but 10 ms.

Written by Quassnoi

July 25th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply