Selecting options
Comments enabled. I *really* need your comment
From Stack Overflow:
I have three checkboxes in my application.
If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.
Can I do this with a single
SQL
query?
Assuming MySQL.
The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.
This is best achieved by using two mutually exclusive SELECT
statements with a UNION ALL
.
Let's create a sample table and try to construct the query:
CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory; CREATE TABLE t_data ( id INT NOT NULL PRIMARY KEY, opt INT NOT NULL, value VARCHAR(50) NOT NULL, KEY ix_data_opt (opt) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE t_option ( id INT NOT NULL PRIMARY KEY, name INT NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE t_checked ( session INT NOT NULL, opt INT NOT NULL, PRIMARY KEY pk_checked_session_option (session, opt) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_filler(100000); COMMIT; INSERT INTO t_data (id, opt, value) SELECT id, FLOOR(RAND(20090526) * 10) + 1, CONCAT('Data ', id) FROM filler; INSERT INTO t_option (id, name) SELECT id, CONCAT('Option ', id) FROM filler LIMIT 10; INSERT INTO t_checked (session, opt) VALUES (1, 1), (1, 3), (1, 5), (1, 7); [/sourcecode] There are <strong>100,000</strong> rows in <code>t_data</code>, which is a table to filter. The filtering field is called <code>opt</code>, it's randomly assigned with options from <strong>1</strong> to <strong>10</strong>. Checked options are contained in the table <code>t_checked</code>. It has a <code>PRIMARY KEY</code> on two fields: <code>session</code> and options. This table containts <code>options</code> checked for a given <code>session</code>. For session <strong>1</strong>, options <strong>1</strong>, <strong>3</strong>, <strong>5</strong> and <strong>7</strong> are checked; for session <strong>2</strong>, nothing is checked. Now, let's try to select all <code>data</code> for session <strong>1</strong>: SELECT COUNT(*) FROM ( SELECT * FROM t_data WHERE opt IN ( SELECT opt FROM t_checked WHERE session = 1 ) UNION ALL SELECT * FROM t_data WHERE NOT EXISTS ( SELECT 1 FROM t_checked WHERE session = 1 ) ) q
COUNT(*) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
40090 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.7031s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | Select tables optimized away | ||||||||
2 | DERIVED | t_data | ALL | 100908 | 100.00 | Using where | ||||
3 | DEPENDENT SUBQUERY | t_checked | eq_ref | PRIMARY | PRIMARY | 8 | func | 1 | 100.00 | Using where; Using index |
4 | UNION | Impossible WHERE noticed after reading const tables | ||||||||
5 | SUBQUERY | t_checked | ref | PRIMARY | PRIMARY | 4 | 2 | 100.00 | Using index | |
UNION RESULT | <union2,4> | ALL |
There are two SELECT
's in the query.
The first one selects all rows that satisfy the IN
condition. This is quite straightforrward.
The second one select all rows as long as there are no checked options for a given session.
It's easy to see that these SELECT
's are mutually exclusive: the first one can return rows only if there are some values in t_checked
(otherwise IN
condition will never be satisifed); the second one can return only if there are no rows.
For session 1 there are 40,009 rows satisfying our query, which is just right for 4 of 10 options checked.
Now, let's run the query for session 2:
SELECT COUNT(*) FROM ( SELECT * FROM t_data WHERE opt IN ( SELECT opt FROM t_checked WHERE session = 2 ) UNION ALL SELECT * FROM t_data WHERE NOT EXISTS ( SELECT 1 FROM t_checked WHERE session = 2 ) ) q
COUNT(*) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (1.3281s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | Select tables optimized away | ||||||||
2 | DERIVED | t_data | ALL | 100908 | 100.00 | Using where | ||||
3 | DEPENDENT SUBQUERY | t_checked | eq_ref | PRIMARY | PRIMARY | 8 | func | 1 | 100.00 | Using where; Using index |
4 | UNION | t_data | ALL | 100908 | 100.00 | |||||
5 | SUBQUERY | t_checked | ref | PRIMARY | PRIMARY | 4 | 1 | 100.00 | Using index | |
UNION RESULT | <union2,4> | ALL |
For session 2, it returns all 100,000 rows.
However, for session 2 this query runs twice as long as for session 1. This is not how it should be, since there is an extra check for session 1.
If we look into the plans for these queries, we see that the second query is optimized away for session 1.
MySQL is able to see that there are some rows for this session in t_checked
, NOT EXISTS
condition will fail and the query will return nothing.
That's why it doesn't even try to execute it.
However, for session 2 this is not the case. MySQL executes both queries. The first one returns an empty resultset, but all 100,000 rows are browsed anyway.
To assist MySQL a little, we just need to add a condition that can be easily checked in the beginning to see if the query will return nothing.
This condition, is, of course, a reversal of the condition in the second query, i. e. EXISTS
:
SELECT COUNT(*) FROM ( SELECT * FROM t_data WHERE EXISTS ( SELECT 1 FROM t_checked WHERE session = 2 ) AND opt IN ( SELECT opt FROM t_checked WHERE session = 2 ) UNION ALL SELECT * FROM t_data WHERE NOT EXISTS ( SELECT 1 FROM t_checked WHERE session = 2 ) ) q
COUNT(*) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.3160s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | Select tables optimized away | ||||||||
2 | DERIVED | Impossible WHERE noticed after reading const tables | ||||||||
4 | DEPENDENT SUBQUERY | t_checked | eq_ref | PRIMARY | PRIMARY | 8 | const,func | 1 | 100.00 | Using where; Using index |
3 | SUBQUERY | t_checked | ref | PRIMARY | PRIMARY | 4 | 1 | 100.00 | Using index | |
5 | UNION | t_data | ALL | 100908 | 100.00 | |||||
6 | SUBQUERY | t_checked | ref | PRIMARY | PRIMARY | 4 | 1 | 100.00 | Using index | |
UNION RESULT | <union2,5> | ALL |
Now MySQL sees an IMPOSSIBLE WHERE
in the first query, the first query is not executed and the whole query takes only 0.31 seconds instead of 1.32, being 4 times as fast as the original, unoptimized query.