EXPLAIN EXTENDED

How to create fast database queries

Selecting options

Comments enabled. I *really* need your comment

From Stack Overflow:

I have three checkboxes in my application.

If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.

Can I do this with a single SQL query?

Assuming MySQL.

The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.

This is best achieved by using two mutually exclusive SELECT statements with a UNION ALL.

Let's create a sample table and try to construct the query:

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_data (
        id INT NOT NULL PRIMARY KEY,
        opt INT NOT NULL,
        value VARCHAR(50) NOT NULL,
        KEY ix_data_opt (opt)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE t_option (
        id INT NOT NULL PRIMARY KEY,
        name INT NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE t_checked (
        session INT NOT NULL,
        opt INT NOT NULL,
        PRIMARY KEY pk_checked_session_option (session, opt)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

INSERT
INTO    t_data (id, opt, value)
SELECT  id, FLOOR(RAND(20090526) * 10) + 1, CONCAT('Data ', id)
FROM    filler;

INSERT
INTO    t_option (id, name)
SELECT  id, CONCAT('Option ', id)
FROM    filler
LIMIT 10;

INSERT
INTO    t_checked (session, opt)
VALUES
        (1, 1),
        (1, 3),
        (1, 5),
        (1, 7);
&#91;/sourcecode&#93;

There are <strong>100,000</strong> rows in <code>t_data</code>, which is a table to filter.

The filtering field is called <code>opt</code>, it's randomly assigned with options from <strong>1</strong> to <strong>10</strong>.

Checked options are contained in the table <code>t_checked</code>. It has a <code>PRIMARY KEY</code> on two fields: <code>session</code> and options.

This table containts <code>options</code> checked for a given <code>session</code>.

For session <strong>1</strong>, options <strong>1</strong>, <strong>3</strong>, <strong>5</strong> and <strong>7</strong> are checked; for session <strong>2</strong>, nothing is checked.

Now, let's try to select all <code>data</code> for session <strong>1</strong>:


SELECT  COUNT(*)
FROM    (
        SELECT  *
        FROM    t_data
        WHERE   opt IN
                (
                SELECT  opt
                FROM    t_checked
                WHERE   session = 1
                )
        UNION ALL
        SELECT  *
        FROM    t_data
        WHERE   NOT EXISTS (
                SELECT  1
                FROM    t_checked
                WHERE   session = 1
                )
        ) q

COUNT(*)
40090
1 row fetched in 0.0001s (0.7031s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY Select tables optimized away
2 DERIVED t_data ALL 100908 100.00 Using where
3 DEPENDENT SUBQUERY t_checked eq_ref PRIMARY PRIMARY 8 func 1 100.00 Using where; Using index
4 UNION Impossible WHERE noticed after reading const tables
5 SUBQUERY t_checked ref PRIMARY PRIMARY 4 2 100.00 Using index
UNION RESULT <union2,4> ALL

There are two SELECT's in the query.

The first one selects all rows that satisfy the IN condition. This is quite straightforrward.

The second one select all rows as long as there are no checked options for a given session.

It's easy to see that these SELECT's are mutually exclusive: the first one can return rows only if there are some values in t_checked (otherwise IN condition will never be satisifed); the second one can return only if there are no rows.

For session 1 there are 40,009 rows satisfying our query, which is just right for 4 of 10 options checked.

Now, let's run the query for session 2:

SELECT  COUNT(*)
FROM    (
        SELECT  *
        FROM    t_data
        WHERE   opt IN
                (
                SELECT  opt
                FROM    t_checked
                WHERE   session = 2
                )
        UNION ALL
        SELECT  *
        FROM    t_data
        WHERE   NOT EXISTS (
                SELECT  1
                FROM    t_checked
                WHERE   session = 2
                )
        ) q

COUNT(*)
100000
1 row fetched in 0.0001s (1.3281s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY Select tables optimized away
2 DERIVED t_data ALL 100908 100.00 Using where
3 DEPENDENT SUBQUERY t_checked eq_ref PRIMARY PRIMARY 8 func 1 100.00 Using where; Using index
4 UNION t_data ALL 100908 100.00
5 SUBQUERY t_checked ref PRIMARY PRIMARY 4 1 100.00 Using index
UNION RESULT <union2,4> ALL

For session 2, it returns all 100,000 rows.

However, for session 2 this query runs twice as long as for session 1. This is not how it should be, since there is an extra check for session 1.

If we look into the plans for these queries, we see that the second query is optimized away for session 1.

MySQL is able to see that there are some rows for this session in t_checked, NOT EXISTS condition will fail and the query will return nothing.

That's why it doesn't even try to execute it.

However, for session 2 this is not the case. MySQL executes both queries. The first one returns an empty resultset, but all 100,000 rows are browsed anyway.

To assist MySQL a little, we just need to add a condition that can be easily checked in the beginning to see if the query will return nothing.

This condition, is, of course, a reversal of the condition in the second query, i. e. EXISTS:

SELECT  COUNT(*)
FROM    (
        SELECT  *
        FROM    t_data
        WHERE   EXISTS (
                SELECT  1
                FROM    t_checked
                WHERE   session = 2
                )
                AND opt IN
                (
                SELECT  opt
                FROM    t_checked
                WHERE   session = 2
                )
        UNION ALL
        SELECT  *
        FROM    t_data
        WHERE   NOT EXISTS (
                SELECT  1
                FROM    t_checked
                WHERE   session = 2
                )
        ) q

COUNT(*)
100000
1 row fetched in 0.0001s (0.3160s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY Select tables optimized away
2 DERIVED Impossible WHERE noticed after reading const tables
4 DEPENDENT SUBQUERY t_checked eq_ref PRIMARY PRIMARY 8 const,func 1 100.00 Using where; Using index
3 SUBQUERY t_checked ref PRIMARY PRIMARY 4 1 100.00 Using index
5 UNION t_data ALL 100908 100.00
6 SUBQUERY t_checked ref PRIMARY PRIMARY 4 1 100.00 Using index
UNION RESULT <union2,5> ALL

Now MySQL sees an IMPOSSIBLE WHERE in the first query, the first query is not executed and the whole query takes only 0.31 seconds instead of 1.32, being 4 times as fast as the original, unoptimized query.

Written by Quassnoi

May 26th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply