November 10, 2009 - EXPLAIN EXTENDED at EXPLAIN EXTENDED

Archive for November 10th, 2009

My table looks like:

person_id car_id miles

1 1 100

1 2 200

2 3 1000

2 4 500

I need to total the miles for each person and then average those totals.

person_id	car_id	miles
1	1	100
1	2	200
2	3	1000
2	4	500

This is a good question to demonstrate how to improve MySQL queries using covering indexes.

The query should return a single value, which is SUM(miles) / COUNT(DISTINCT person_id). Since this query does not relate the fields to each other (only total sum and total count matter), it could be performed using only the indexes on miles and person_id. However, MySQL is not able to do this by default and it requires a little help from developer's side.

Let's create a sample table:

Table creation details

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE persons (
id INT NOT NULL PRIMARY KEY,
person_id INT NOT NULL,
miles INT NOT NULL,
stuffing VARCHAR(200) NOT NULL,
KEY ix_persons_person (person_id),
KEY ix_persons_miles (miles)
) ENGINE=InnoDB;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(1000000);
COMMIT;

INSERT
INTO    persons
SELECT  id, ((id - 1) % 1000) + 1, FLOOR(RAND(20091110) * 10000) + 1,
        RPAD('', 200, '*')
FROM    filler;

This table has 1,000,000 rows with 1,000 distinct person_id's. There are indexes on miles and person_id.

A plain query would look like this:

SELECT  SUM(miles) / COUNT(DISTINCT person_id) AS total
FROM    persons

total
4998778.6000
1 row fetched in 0.0001s (2.1468s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	persons	ALL					1000066	100.00

select (sum(`20091110_indexes`.`persons`.`miles`) / count(distinct `20091110_indexes`.`persons`.`person_id`)) AS `total` from `20091110_indexes`.`persons`

The query takes more than 2 seconds.

If we look into the query plan we will see that a full table scan is used. The query needs both the values of person_id and miles and no index contains both values. MySQL browses all table records which are quite large.

But despite the fact that the query needs both values, it does not need to correspond the values. It may calculate the SUM first (using an index on miles), and COUNT(DISTINCT person_id) in another query. Since index records are much less in size that the table records, the separate aggregates on them will be calculated much faster.

And we can improve the query even more by replacing COUNT(DISTINCT person_id) with a COUNT over DISTINCT person_id in a subquery. The latter query will use INDEX FOR GROUP BY and therefore be almost instant.

Here's how the new improved query would look like:

SELECT  SUM(miles) /
(
SELECT  COUNT(*)
FROM    (
SELECT  DISTINCT person_id
FROM    persons
) q
) AS total
FROM    persons

total
4998778.6000
1 row fetched in 0.0001s (0.7344s)

id	select_type	table	type	key	key_len	rows	filtered	Extra
1	PRIMARY	persons	index	ix_persons_miles	4	1000066	100.00	Using index
2	SUBQUERY							Select tables optimized away
3	DERIVED	persons	range	ix_persons_person	4	2021	100.00	Using index for group-by

select (sum(`20091110_indexes`.`persons`.`miles`) / (select count(0) AS `COUNT(*)` from (select distinct `20091110_indexes`.`persons`.`person_id` AS `person_id` from `20091110_indexes`.`persons`) `q`)) AS `total` from `20091110_indexes`.`persons`

This query uses separate indexes to calculate the aggregate values, and by using INDEX FOR GROUP-BY, COUNT(DISTINCT person_id) is calculated almost instantly.

Overall query performance is 750 ms, or thrice as fast as the original query.

Written by Quassnoi

November 10th, 2009 at 11:00 pm

Posted in MySQL

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

EXPLAIN EXTENDED

Archive for November 10th, 2009

MySQL: average of sums using indexes

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

Archive for November 10th, 2009

MySQL: average of sums using indexes

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow