MySQL: selecting rows before and after filtered one at EXPLAIN EXTENDED

I have a basic MySQL table, terms, comprised of an id and term field.

I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it

An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.

As we all know, MySQL doesn't support a ROW_NUMBER() or a similar function so we end up resorting to user variables and sub-selects.

I also cannot create a view with user defined variables because MySQL doesn't allow that.

MySQL indeed doesn't support ROW_NUMBER() and this function indeed can be emulated using session variables:

Emulating ROW_NUMBER() in MySQL

However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.

Let's create a sample table and see how:

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE terms (
id INT NOT NULL PRIMARY KEY,
term VARCHAR(50) NOT NULL,
KEY ix_terms_term (term)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(10000);
COMMIT;

INSERT
INTO    terms
SELECT  id,
        (
        SELECT  GROUP_CONCAT(CONCAT(
                SUBSTRING('bcdfghjklmnpqrstvwxz', FLOOR(RAND(20090725) * 20) + 1, 1), 
                SUBSTRING('aeiouy', FLOOR(RAND(20090725 &lt;&lt; 1) * 6) + 1, 1)
                ) SEPARATOR '')
        FROM    (
                SELECT  *
                FROM    filler
                LIMIT 5
                ) q
        )
FROM    filler;

This script generates a table of 10,000 random non-words, like these:

id	term
1	gecotyqy
2	wykajelo
3	hozedequ

999	fetojaju
1000	sopiqeke

To search for a given term, we need to do the following:

Find the closest matching term (this is matching or a one just below the match)
Find the 10 terms next to the closest (using ORDER BY term LIMIT 10)
Find the 20 terms before the closest (using ORDER BY term DESC LIMIT 20)
UNION ALL the results
Order the results by term then id
Add fields to show if a given row is a match

For instance, let's search for the word serendipity. For the sake of brevity, I'll output 5 records after and 10 records before.

SELECT  *,
IF(
(term, id) = (
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'serendipity'
                     ORDER BY
                             term DESC, id DESC
                     LIMIT 1
                     ),
        term = 'serendipity',
        NULL
        ) AS hit
FROM    (
        SELECT  *       
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &gt;=
(
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'serendipity'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term, id
                LIMIT 5
                ) q
        UNION ALL
        SELECT  *
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &lt;
                        (       
                        SELECT  term, id
                        FROM    terms
                        WHERE   term &lt;= 'serendipity'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term DESC, id DESC
                LIMIT 10
                ) q
        ) q2
ORDER BY
        term, id

id	term	hit
9412	sepatobe
5538	sepesewy
8552	sepipida
7302	sepiqoke
9242	sepyreqo
3699	seqezoca
9663	seqezowa
2953	seqicylo
7177	seqowyqa
4526	seqoxawi
1108	seqozixa	0
6741	serojeme
7428	serudaqi
6830	serugaca
3683	sesejeni
15 rows fetched in 0.0005s (0.0125s)

id	select_type	table	type	possible_keys	key	key_len	rows	filtered	Extra
1	PRIMARY	<derived3>	ALL				15	100.00	Using filesort
3	DERIVED	<derived4>	ALL				5	100.00
4	DERIVED	terms	index		ix_terms_term	152	5	196840.00	Using where; Using index
5	SUBQUERY	terms	range	ix_terms_term	ix_terms_term	152	4921	100.00	Using where; Using index
6	UNION	<derived7>	ALL				10	100.00
7	DERIVED	terms	index		ix_terms_term	152	10	98420.00	Using where; Using index
8	SUBQUERY	terms	range	ix_terms_term	ix_terms_term	152	4921	100.00	Using where; Using index
	UNION RESULT	<union3,6>	ALL
2	SUBQUERY	terms	range	ix_terms_term	ix_terms_term	152	4921	100.00	Using where; Using index

We see that there were no exact match, and the query returned us a closest match possible (marking it with a 0).

If we use the same query to search a non-word bakatora (which happens to exists in Japanese and mean stupid tiger), we'll get the following:

SELECT  *,
IF(
(term, id) = (
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'bakatora'
                     ORDER BY
                             term DESC, id DESC
                     LIMIT 1
                     ),
        term = 'bakatora',
        NULL
        ) AS hit
FROM    (
        SELECT  *       
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &gt;=
(
SELECT  term, id
FROM    terms
WHERE   term &lt;= 'bakatora'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term, id
                LIMIT 5
                ) q
        UNION ALL
        SELECT  *
        FROM    (
                SELECT  *
                FROM    terms
                WHERE   (term, id) &lt;
                        (       
                        SELECT  term, id
                        FROM    terms
                        WHERE   term &lt;= 'bakatora'
                        ORDER BY
                                term DESC, id DESC
                        LIMIT 1
                        )
                ORDER BY
                        term DESC, id DESC
                LIMIT 10
                ) q
        ) q2
ORDER BY
        term, id

id	term	hit
3116	baholudy
8424	bahugygi
9524	bahujuwy
2242	bahyheni
3707	bajaqace
2995	bajaqyce
6116	bajeraju
5569	bajipoba
5323	bajonisi
9946	bajoquci
1993	bakatora	1
3549	bakesema
5204	bakesile
7307	bakewefe
9021	bakuwije
15 rows fetched in 0.0005s (0.0151s)

, we see an exact match.

These queries are very fast, since the only filesort they use sorts the final result on but 15 rows, and the queries complete in but 10 ms.

Written by Quassnoi

July 25th, 2009 at 11:00 pm

Posted in MySQL

EXPLAIN EXTENDED

MySQL: selecting rows before and after filtered one

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow