EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: 5 Claims About SQL, Explained. You're welcome to read and comment on it.

MySQL: splitting aggregate queries

with 2 comments

Answering questions asked on the site.

Victor asks:

I have a table which I will call sale to protect the innocent:

Sale
id product price amount date

I need to retrieve ultimate values of price, amount and date for each product:

SELECT  product,
        MIN(price), MAX(price),
        MIN(amount), MAX(amount),
        MIN(date), MAX(date)
FROM    sale
GROUP BY
        product

The query only returns about 100 records.

I have all these fields indexed (together with product), but this still produces a full table scan over 3,000,000 records.

How do I speed up this query?

To retrieve the ultimate values of the fields, MySQL would just need to make a loose index scan over each index and take the max and min value of the field for each product.

However, the optimizer won't do when multiple indexes are involved. Instead, it will revert to a full scan.

There is a workaround for it. Let's create a sample table and see them:

Table creation details

This table contains 3M random records for 100 distinct products and is indexed appropriately.

Here's a straightforward query:

SELECT  product,
        MIN(price), MAX(price),
        MIN(amount), MAX(amount),
        MIN(dt), MAX(dt)
FROM    sale
GROUP BY
        product

product MIN(price) MAX(price) MIN(amount) MAX(amount) MIN(dt) MAX(dt)
1 20.01 120.00 101 1100 2010-12-01 06:14:22 2011-03-26 23:42:32
2 20.01 120.00 101 1100 2010-12-01 06:39:38 2011-03-26 23:58:25
3 20.01 120.00 101 1100 2010-12-01 06:13:34 2011-03-26 23:54:32
4 20.01 120.00 101 1100 2010-12-01 06:14:30 2011-03-26 23:58:45
100 20.01 120.00 101 1100 2010-12-01 06:32:58 2011-03-26 23:58:24
100 rows fetched in 0.0052s (13.8279s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE sale index ix_sale_product_amount 8 3000409 100.00
select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`price`) AS `MIN(price)`,max(`20110327_split`.`sale`.`price`) AS `MAX(price)`,min(`20110327_split`.`sale`.`amount`) AS `MIN(amount)`,max(`20110327_split`.`sale`.`amount`) AS `MAX(amount)`,min(`20110327_split`.`sale`.`dt`) AS `MIN(dt)`,max(`20110327_split`.`sale`.`dt`) AS `MAX(dt)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`

As was expected, the query uses a full table scan (or, rather, an index scan) and takes almost 14 seconds to complete.

In order to make the engine use three separate indexes, we would need to split the query into three queries each searching for ultimate values on one column, and then combine the results.

Each of these queries would be a MIN / MAX query on a trailing column of a composite index, combined with a GROUP BY on the index's leading columns, and, as such, would be subject to loose index scan optimization.

To combine the results, we will of course just join them on product. This will be quite efficient too since the resultsets are going to be quite small (100 records each, exactly).

And here is the query:

SELECT  *
FROM    (
        SELECT  product, MIN(amount), MAX(amount)
        FROM    sale
        GROUP BY
                product
        ) qa
NATURAL JOIN
        (
        SELECT  product, MIN(price), MAX(price)
        FROM    sale
        GROUP BY
                product
        ) qp
NATURAL JOIN
        (
        SELECT  product, MIN(dt), MAX(dt)
        FROM    sale
        GROUP BY
                product
        ) qd

product MIN(amount) MAX(amount) MIN(price) MAX(price) MIN(dt) MAX(dt)
1 101 1100 20.01 120.00 2010-12-01 06:14:22 2011-03-26 23:42:32
2 101 1100 20.01 120.00 2010-12-01 06:39:38 2011-03-26 23:58:25
3 101 1100 20.01 120.00 2010-12-01 06:13:34 2011-03-26 23:54:32
4 101 1100 20.01 120.00 2010-12-01 06:14:30 2011-03-26 23:58:45
100 101 1100 20.01 120.00 2010-12-01 06:32:58 2011-03-26 23:58:24
100 rows fetched in 0.0053s (0.0105s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 100 100.00
1 PRIMARY <derived3> ALL 100 100.00 Using where; Using join buffer
1 PRIMARY <derived4> ALL 100 100.00 Using where; Using join buffer
4 DERIVED sale range ix_sale_product_dt 4 19 100.00 Using index for group-by
3 DERIVED sale range ix_sale_product_price 4 19 100.00 Using index for group-by
2 DERIVED sale range ix_sale_product_amount 4 1067 100.00 Using index for group-by
select `qa`.`product` AS `product`,`qa`.`MIN(amount)` AS `MIN(amount)`,`qa`.`MAX(amount)` AS `MAX(amount)`,`qp`.`MIN(price)` AS `MIN(price)`,`qp`.`MAX(price)` AS `MAX(price)`,`qd`.`MIN(dt)` AS `MIN(dt)`,`qd`.`MAX(dt)` AS `MAX(dt)` from (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`amount`) AS `MIN(amount)`,max(`20110327_split`.`sale`.`amount`) AS `MAX(amount)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qa` join (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`price`) AS `MIN(price)`,max(`20110327_split`.`sale`.`price`) AS `MAX(price)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qp` join (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`dt`) AS `MIN(dt)`,max(`20110327_split`.`sale`.`dt`) AS `MAX(dt)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qd` where ((`qp`.`product` = `qa`.`product`) and (`qd`.`product` = `qa`.`product`))

Each individual query is executed with using index for group-by, that is jumping over the min and max values in a loose index scan.

The queries are combined in a join which, despite not using the indexes, is still very fast because only 100 records are joined (of course fitting into a join buffer).

The overall query time is only 10 ms.

Hope that helps.


I'm always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

March 28th, 2011 at 11:00 pm

Posted in MySQL

2 Responses to 'MySQL: splitting aggregate queries'

Subscribe to comments with RSS or TrackBack to 'MySQL: splitting aggregate queries'.

  1. Learning quite a few things while surfing your page.
    You’re an SQL-Wizzard ^^
    Keep up the good work!

    sneils

    18 Jun 12 at 12:55

  2. I did know this technique, but I’m glad I read your post, because now I’m very interested in the methods you use to populate test tables. I hope you’ll post about this topic in the future.

    Federico

    11 Jan 13 at 15:21

Leave a Reply