MySQL: GROUP BY in UNION
From Stack Overflow:
I have a query where I have a custom developed UDF that is used to calculate whether or not certain points are within a polygon (first query in
UNION
) or circular (second query inUNION
) shape.SELECT a.geo_boundary_id, … FROM geo_boundary_vertex a, … … GROUP BY a.geo_boundary_id UNION SELECT b.geo_boundary_id, … FROM geo_boundary b, … … GROUP BY b.geo_boundary_idWhen I run an explain for the query I get
filesort
for both queries within theUNION
.Now, I can split the queries up and use the
ORDER BY NULL
trick to get rid of thefilesort
however when I attempt to add that to the end of aUNION
it doesn't work.How do I get rid of the
filesort
?
In MySQL, GROUP BY
also implies ORDER BY
on the same set of expressions in the same order. That's why it adds an additional filesort
operation to sort the resultset if it does not come out naturally sorted (say, from an index).
This is not always a desired behavior, and MySQL manual suggests adding ORDER BY NULL
to the queries where sorting is not required. This can improve performance of the queries.
Let's create a sample table and see:
The table contains 3,000,000 random records with value1
and value2
between 1 and 300,000.
Here's the plan we get with a mere UNION
of two GROUP BY
queries:
SELECT value1 AS value FROM grouping GROUP BY value1 UNION SELECT value2 AS value FROM grouping GROUP BY value2 LIMIT 10
value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0002s (8.4998s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | grouping | ALL | 3000279 | 100.00 | Using temporary; Using filesort | ||||
2 | UNION | grouping | ALL | 3000279 | 100.00 | Using temporary; Using filesort | ||||
UNION RESULT | <union1,2> | ALL |
select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` union select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` limit 10
We see that there is a filesort
in each of the queries.
MySQL does allow using ORDER BY
in the queries merged with UNION
or UNION ALL
. To do this, we just need to wrap each query into a set of parentheses:
( SELECT value1 AS value FROM grouping GROUP BY value1 ORDER BY NULL ) UNION ( SELECT value2 AS value FROM grouping GROUP BY value2 ORDER BY NULL ) LIMIT 10
value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0002s (8.4792s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | grouping | ALL | 3000279 | 100.00 | Using temporary; Using filesort | ||||
2 | UNION | grouping | ALL | 3000279 | 100.00 | Using temporary; Using filesort | ||||
UNION RESULT | <union1,2> | ALL |
(select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` order by NULL) union (select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` order by NULL) limit 10
However, the plan remained the same. Why?
MySQL's documentation says:
Use of
ORDER BY
for individualSELECT
statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use ofORDER BY
in this context is typically in conjunction withLIMIT
, so that it is used to determine the subset of the selected rows to retrieve for theSELECT
, even though it does not necessarily affect the order of those rows in the finalUNION
result. IfORDER BY
appears withoutLIMIT
in aSELECT
, it is optimized away because it will have no effect anyway.
This means that the optimizer just removes ORDER BY
from the UNION
parts if they are not used along with LIMIT
.
This is of course a good idea: since individual ORDER BY
have no effect on the order of the final query anyway, there is no use in executing them or even taking them into account.
However, this idea would be much better if the same were also true for GROUP BY
. Currently, the optimizer does not optimize away the ordering behavior of the GROUP BY
queries which are parts of a UNION
and they cannot be cured with ORDER BY NULL
(whose only goal is not to order) since this is removed by the optimizer.
However, since only ORDER BY
clauses not accompanied with a LIMIT
are thrown away, we could just add a LIMIT
. Of course it should be large enough to guarantee that all record would be returned.
Let's see:
( SELECT value1 AS value FROM grouping GROUP BY value1 ORDER BY NULL LIMIT 10000000000 ) UNION ( SELECT value2 AS value FROM grouping GROUP BY value2 ORDER BY NULL LIMIT 10000000000 ) LIMIT 10
value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12462 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
205466 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
89941 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
133309 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
96722 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
83683 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
128249 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
90196 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
66232 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
60571 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0002s (6.9842s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | grouping | ALL | 3000279 | 100.00 | Using temporary | ||||
2 | UNION | grouping | ALL | 3000279 | 100.00 | Using temporary | ||||
UNION RESULT | <union1,2> | ALL |
(select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` order by NULL limit 10000000000) union (select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` order by NULL limit 10000000000) limit 10
Now, there are no filesort
operations in the plan and the query runs 20% faster.
This is not a very elegant solution of course. More than that, a solution similar to it was used for SQL Server 2000 which does not allow using ORDER BY
without a TOP
in the inline views. In SQL Server 2000, TOP 100%
forced the order of the nested queries and usually made them spooled or otherwise materialized.
It was quite a nasty surprise when SQL Server 2005 has improved
its optimizer to detect such tricks and ignore ORDER BY
for TOP 100% queries. However, with all improvements introduced in SQL Server 2005, most of these queries could be just rewritten in a more clean and efficient way.
Nevertheless, this solution is still safe to use, because it does not change the semantics of the query (if LIMIT
is chosen large enough), but is just an optimizer hack. In the worst case, the query will just become as slow as it initially was, and an extra filesort
is not the worst of all things that can happen to a query.
Meanwhile, I've posted it as bug 60702 to MySQL bug database and am hoping they'll fix it in the next release.
I’m not sure what your workaround is doing, but the results for your workaround are different from the original query, so the claim that its faster doesn’t really apply – if the results are not the same, its not the same query :).
As I understand it, a group by will necessarily have to sort the data in order to group it. I suppose you could group by without sorting, but I doubt it will be as efficient.
Raj
1 Apr 11 at 00:05
@Raj: I didn’t append an
ORDER BY
to the trailing result so that it is obvious that the last query does not sort. You may add anORDER BY
before the trailingLIMIT
if you like and see that the results are identical, the latter query still being more efficient. This is not the point of the article.Grouping does not necessarily sort the data, and a
sort group by
is not necessarily the most efficient method. In fact, thefilesort
you see in the plans is used to sort the data after they were grouped.Quassnoi
1 Apr 11 at 02:07