EXPLAIN EXTENDED

How to create fast database queries

Archive for June 30th, 2010

Indexing for ORDER BY / LIMIT

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Frode Underhill asks:

I have some applications that are logging to a MySQL database table.

The table is pretty standard on the form:

time
BIGINT(20)
source
TINYINT(4)
severity
ENUM
text
VARCHAR(255)

, where source identifies the system that generated the log entry.

There are very many entries in the table (>100 million), of which 99.9999% are debug or info messages.

I'm making an interface for browsing this log, which means I'll be doing queries like

SELECT  *
FROM    log
WHERE   source = 2
AND severity IN (1,2)
AND time > 12345
ORDER BY
time ASC
LIMIT 30

, if I want to find debug or info log entries from a certain point in time, or

SELECT  *
FROM    log
WHERE   source = 2
AND severity IN (1,2)
AND time < 12345 
ORDER BY
        time DESC
LIMIT 30

for finding entries right before a certain time.

How would one go about indexing & querying such a table?

I thought I had it figured out (I pretty much just tried every different combination of columns in an index), but there's always some set of parameters that results in a really slow query.

The problem is that you cannot use a single index both for filtering and ordering if you have a ranged condition (severity IN (1, 2) in this case).

Recently I wrote an article with a proposal to improve SQL optimizer to handle these conditions. If a range has low cardinality (this is, there are few values that con possibly satisfy the range), then the query could be improved by rewriting the range as a series of individual queries, each one using one of the values constituting the range in an equijoin:

No optimizers can handle this condition automatically yet, so we'll need to emulate it.

Since the severity field is defined as an enum with only 5 values possible, any range condition on this field can be satisfied by no more than 5 distinct values, thus making this table ideal for rewriting the query.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

June 30th, 2010 at 11:00 pm

Posted in MySQL