MIN on a composite index at EXPLAIN EXTENDED

MAX and MIN on a composite index

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Ivo Radev asks:

I am trying to make a very simple query.

We have a log table which different machines write to. Given the machine list, I need to find the latest log timestamp.

Currently, the query looks like this:
SELECT  MAX(log_time)
FROM    log_table
WHERE   log_machine IN ($machines)
, and I pass the comma-separated list of $machines from PHP.

The weird thing is that the query is literally instant when there is only one machine (any) in the list but slow when there are multiple machines.

I'm considering doing it in separate queries and then process the results in PHP. However I'd like to know if there is a fast solution in MySQL.

Most probably, there is a composite index on (log_machine, log_time) which is being used for the query.

Usually, a query like this:

SELECT  MAX(log_time)
FROM    log_table

on the indexed field log_time can be served with a single index seek on the index.

Indeed, the MAX(log_time), by definition, is the latest entry in the index order, and can be fetched merely by finding the trailing index entry. It's a matter of several page reads in the B-Tree, each one following the rightmost link to the lower-level page.

Similarly, this query:

SELECT  MAX(log_time)
FROM    log_table
WHERE   log_machine = $my_machine

can be served with a single index seek too. However, the index should include log_machine as a leading column.

In this case, a set of records satisfying the WHERE clause of the query is represented by a single logically continuous block of records in the index, each one sharing the same value of log_machine. MAX(log_time) will of course be held by the last record in this block. MySQL just finds that last record and takes the log_time out of it.

Now, what if we have a multiple condition on log_machine?

The index remains the same, but the record holding MAX(log_time) is not the last record in a single continuous block anymore. Instead, there are multiple blocks each having its own MAX(log_date). log_time cannot be found merely by taking the last record from the index block: it is not known which one is the correct one.

On composite indexes, however, MySQL offers loose index scan. This means that it jumps over the distinct values of the leading column, doing an index seek (instead of index scan) to retrieve each next value.

As stated in the documentation, this method is ideal to doing the queries like that:

SELECT  log_machine, MAX(log_time)
FROM    log_table
WHERE   log_machine IN ($my_machine_list)

As we said earlier, for each log_machine, its MAX(log_time) can be returned very fast, and the list of the log_machines could be obtained with a loose index scan, by seeking the keys in the index.

This query, however, will not produce a single MAX(log_time): instead, it will return as many maximums as there are values in the list (which are found in the table, of course).

But this can be easily worked around: we just select the greatest one of these records. Since the subquery will only return several records, the greatest one if them can be found almost instantly.

Let's create a sample table:

Table creation details

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE log_table (
id INT NOT NULL PRIMARY KEY,
log_machine VARCHAR(20) NOT NULL,
log_time DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(1000000);
COMMIT;

INSERT
INTO    log_table
SELECT  id,
        CONCAT('Machine ' , CEILING(RAND(20100508) * 10)),
        '2010-05-08' - INTERVAL CEILING(RAND(20100508 &lt;&lt; 1) * 10000000) SECOND
FROM    filler f1;

CREATE INDEX ix_log_machine_time ON log_table (log_machine, log_time);

The table has 1,000,000 records

Now, let's see how the original query performs:

SELECT  MAX(log_time) AS maxtime
FROM    log_table
WHERE   log_machine IN ('Machine 3', 'Machine 5', 'Machine 7', 'Machine 9')

maxtime
2010-05-07 23:59:49
1 row fetched in 0.0001s (0.6406s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	log_table	range	ix_log_machine_time	ix_log_machine_time	62		826326	100.00	Using where; Using index

select max(`20100508_max`.`log_table`.`log_time`) AS `maxtime` from `20100508_max`.`log_table` where (`20100508_max`.`log_table`.`log_machine` in ('Machine 3','Machine 5','Machine 7','Machine 9'))

The query uses range access to retrieve the records and browses all records to find the maximum. It takes 640 ms on a table 1,000,000 log records (which is about a day's output of a single web server under a load decent but not super hard).

Now, let's try to select the greatest of the group-wise maximums:

SELECT  MAX(log_time) AS maxtime
FROM    log_table
WHERE   log_machine IN ('Machine 3', 'Machine 5', 'Machine 7', 'Machine 9')
GROUP BY
log_machine
ORDER BY
1 DESC
LIMIT 1

maxtime
2010-05-07 23:59:49
1 row fetched in 0.0001s (0.0020s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	log_table	range	ix_log_machine_time	ix_log_machine_time	62		16	100.00	Using where; Using index for group-by; Using temporary; Using filesort

select max(`20100508_max`.`log_table`.`log_time`) AS `maxtime` from `20100508_max`.`log_table` where (`20100508_max`.`log_table`.`log_machine` in ('Machine 3','Machine 5','Machine 7','Machine 9')) group by `20100508_max`.`log_table`.`log_machine` order by 1 desc limit 1

Now, it's instant, as it should be.

As it often happens, by appending three seemingly redundant clauses to a query we made MySQL to choose a more efficient plan and the query is now instant even with multiple machines in the list.

Hope that helps.

I'm always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

May 8th, 2010 at 11:00 pm

Posted in MySQL

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

EXPLAIN EXTENDED

MAX and MIN on a composite index

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

MAX and MIN on a composite index

Share this:

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow