Using UNION along with ORDER BY and LIMIT at EXPLAIN EXTENDED

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run.

Also, a UNION ALL takes longer than a UNION DISTINCT.

I would assume allowing duplicates would make the query run faster and not slower.

Am I really just better off running the 2 queries separately? I would prefer to use the UNION.

This kinds of things happen when the queries employ ORDER BY along with LIMIT.

Let's create the sample tables and see what is happening:

CREATE TABLE t_a (
        id INT NOT NULL PRIMARY KEY,
        orderer INT NOT NULL,
        value VARCHAR(50) NOT NULL,
        KEY ix_a_orderer (orderer)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE t_b (
        id INT NOT NULL PRIMARY KEY,
        orderer INT NOT NULL,
        value VARCHAR(50) NOT NULL,
        KEY ix_b_orderer (orderer)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

INSERT
INTO    t_a (id, orderer, value)
SELECT  id, RAND(20090515) * 100000, CONCAT('Value A ', id)
FROM    filler;

INSERT
INTO    t_b (id, orderer, value)
SELECT  id, RAND(20090515 << 1) * 100000, CONCAT('Value B ', id)
FROM    filler;
&#91;/sourcecode&#93;

We have two tables with identical structure and <strong>100,000</strong> records in each.

Field <code>orderer</code> is filled with random values and indexed in both tables.

Now, we can make two queries like this:


SELECT  *
FROM    t_a
ORDER BY
        orderer
LIMIT 10

id	orderer	value
92149	0	Value A 92149
37448	2	Value A 37448
49557	4	Value A 49557
81178	6	Value A 81178
47959	7	Value A 47959
90409	7	Value A 90409
7928	8	Value A 7928
50904	9	Value A 50904
73478	10	Value A 73478
38321	14	Value A 38321
10 rows fetched in 0.0003s (0.0019s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t_a	index		ix_a_orderer	4		100048	100.00

SELECT  *
FROM    t_b
ORDER BY
        orderer
LIMIT 10

id	orderer	value
4863	1	Value B 4863
68014	1	Value B 68014
4667	2	Value B 4667
8278	2	Value B 8278
57729	2	Value B 57729
71955	2	Value B 71955
36856	4	Value B 36856
99213	4	Value B 99213
35509	5	Value B 35509
57936	5	Value B 57936
10 rows fetched in 0.0003s (0.0019s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t_b	index		ix_b_orderer	4		100316	100.00

We can see that both queries are instant, since they employ index scan and limit the recordset.

Now, let's try to order the combined resultset:

SELECT  *
FROM    t_a
UNION   ALL
SELECT  *
FROM    t_b
ORDER BY
        orderer
LIMIT 20

id	orderer	value
92149	0	Value A 92149
4863	1	Value B 4863
68014	1	Value B 68014
57729	2	Value B 57729
71955	2	Value B 71955
4667	2	Value B 4667
8278	2	Value B 8278
37448	2	Value A 37448
36856	4	Value B 36856
49557	4	Value A 49557
99213	4	Value B 99213
35509	5	Value B 35509
57936	5	Value B 57936
81178	6	Value A 81178
1417	6	Value B 1417
90409	7	Value A 90409
93064	7	Value B 93064
47959	7	Value A 47959
7928	8	Value A 7928
83244	8	Value B 83244
20 rows fetched in 0.0006s (0.8750s)

id	select_type	table	type	rows	filtered	Extra
1	PRIMARY	t_a	ALL	100048	100.00
2	UNION	t_b	ALL	100316	100.00
	UNION RESULT	<union1,2>	ALL			Using filesort

We see that the query now works for almost 1 second.

This is because we have no index on the final resulting set. MySQL needs to retrieve all values, sort them and limit the sorted set. That's why we see full table scans and a filesort in the query plan.

Of course, we don't need to scan and sort the whole values to get the top 20 results. We just need to get 20 records from each table and then combine, sort and limit only these records.

Unfortunately, MySQL is not smart enough to do this right in the query plan.

So we need to assist it a little:

SELECT  *
FROM    (
        SELECT  *
        FROM    t_a
        ORDER BY
                orderer
        LIMIT 20
        ) ao
UNION   ALL
SELECT  *
FROM    (
        SELECT  *
        FROM    t_b
        ORDER BY
                orderer
        LIMIT 20
        ) bo
ORDER BY
        orderer
LIMIT 20

id	orderer	value
92149	0	Value A 92149
68014	1	Value B 68014
4863	1	Value B 4863
71955	2	Value B 71955
4667	2	Value B 4667
57729	2	Value B 57729
37448	2	Value A 37448
8278	2	Value B 8278
49557	4	Value A 49557
99213	4	Value B 99213
36856	4	Value B 36856
35509	5	Value B 35509
57936	5	Value B 57936
81178	6	Value A 81178
1417	6	Value B 1417
90409	7	Value A 90409
47959	7	Value A 47959
93064	7	Value B 93064
83244	8	Value B 83244
7928	8	Value A 7928
20 rows fetched in 0.0006s (0.0028s)

id	select_type	table	type	key	key_len	rows	filtered	Extra
1	PRIMARY	<derived2>	ALL			20	100.00
2	DERIVED	t_a	index	ix_a_orderer	4	100048	100.00
3	UNION	<derived4>	ALL			20	100.00
4	DERIVED	t_b	index	ix_b_orderer	4	100316	100.00
	UNION RESULT	<union1,3>	ALL					Using filesort

Now, the query is instant again.

Note, that this query is not the same as two simple queries above combined.

For instance, the resulting query has records 1417, 93064 and 83244, which are absent in the original query for table t_b.

This is because the logic is slightly different for these two queries. The resulting query first combines the records from the two tables, then orders them.

Random values have been distributed so that if we take first 20 values from the combined recordset, 13 of them will belong to table t_b, and only 7 records to t_a.

That's why we see 3 records here that were absent in the original query, that took only 10 first records from table t_b.

To just combine the results of the first two queries and sort them, we need to put them into subqueries and get rid of the final LIMIT clause, leaving only ORDER BY:

SELECT  *
FROM    (
        SELECT  *
        FROM    t_a
        ORDER BY
                orderer
        LIMIT 10
        ) ao
UNION   ALL
SELECT  *
FROM    (
        SELECT  *
        FROM    t_b
        ORDER BY
                orderer
        LIMIT 10
        ) bo
ORDER BY
        orderer

id	orderer	value
92149	0	Value A 92149
68014	1	Value B 68014
4863	1	Value B 4863
57729	2	Value B 57729
8278	2	Value B 8278
71955	2	Value B 71955
37448	2	Value A 37448
4667	2	Value B 4667
99213	4	Value B 99213
36856	4	Value B 36856
49557	4	Value A 49557
57936	5	Value B 57936
35509	5	Value B 35509
81178	6	Value A 81178
90409	7	Value A 90409
47959	7	Value A 47959
7928	8	Value A 7928
50904	9	Value A 50904
73478	10	Value A 73478
38321	14	Value A 38321
20 rows fetched in 0.0006s (0.0024s)

id	select_type	table	type	key	key_len	rows	filtered	Extra
1	PRIMARY	<derived2>	ALL			10	100.00
2	DERIVED	t_a	index	ix_a_orderer	4	100048	100.00
3	UNION	<derived4>	ALL			10	100.00
4	DERIVED	t_b	index	ix_b_orderer	4	100316	100.00
	UNION RESULT	<union1,3>	ALL					Using filesort

This is also instant but always yields 10 record per table.

One final note: these solutions use indexes, but still need a filesort to order the final resultset. This is not a problem for 20 records, but may be inefficient if the final resultset is large (1000 records or more).

In this case it's better to use an approach described in one of my previous articles:

Emulating FULL OUTER JOIN in MySQL

This is a little harder to implement, but always yields perfect performance even for large resultsets.

To be continued.

Written by Quassnoi

May 15th, 2009 at 11:00 pm

Posted in MySQL

EXPLAIN EXTENDED

Ordering UNION’s

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow