EXPLAIN EXTENDED

How to create fast database queries

Archive for June, 2010

Indexing for ORDER BY / LIMIT

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Frode Underhill asks:

I have some applications that are logging to a MySQL database table.

The table is pretty standard on the form:

time
BIGINT(20)
source
TINYINT(4)
severity
ENUM
text
VARCHAR(255)

, where source identifies the system that generated the log entry.

There are very many entries in the table (>100 million), of which 99.9999% are debug or info messages.

I'm making an interface for browsing this log, which means I'll be doing queries like

SELECT  *
FROM    log 
WHERE   source = 2
        AND severity IN (1,2) 
        AND time > 12345 
ORDER BY
        time ASC
LIMIT 30

, if I want to find debug or info log entries from a certain point in time, or

SELECT  *
FROM    log 
WHERE   source = 2
        AND severity IN (1,2) 
        AND time < 12345 
ORDER BY
        time DESC
LIMIT 30

for finding entries right before a certain time.

How would one go about indexing & querying such a table?

I thought I had it figured out (I pretty much just tried every different combination of columns in an index), but there's always some set of parameters that results in a really slow query.

The problem is that you cannot use a single index both for filtering and ordering if you have a ranged condition (severity IN (1, 2) in this case).

Recently I wrote an article with a proposal to improve SQL optimizer to handle these conditions. If a range has low cardinality (this is, there are few values that con possibly satisfy the range), then the query could be improved by rewriting the range as a series of individual queries, each one using one of the values constituting the range in an equijoin:

No optimizers can handle this condition automatically yet, so we'll need to emulate it.

Since the severity field is defined as an enum with only 5 values possible, any range condition on this field can be satisfied by no more than 5 distinct values, thus making this table ideal for rewriting the query.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

June 30th, 2010 at 11:00 pm

Posted in MySQL

GROUP_CONCAT in SQL Server

with one comment

I'm finally back from my vacation. Tunisia's great: dates, Carthage, sea and stuff.

Now, to the questions.

Mahen asks:

Create a table called Group:

Group
id prodname
1 X
1 Y
1 Z
2 A
2 B
2 C

The resultset should look like this:

id prodname
1 X,Y,Z
2 A,B,C

Can you please help me to solve the above problem using a recursive CTE?

This is out good old friend, GROUP_CONCAT. It's an aggregate function that returns all strings within a group, concatenated. It's somewhat different from the other aggregate functions, because, first, dealing with the concatenated string can be quite a tedious task for the groups with lots of records (large strings tend to overflow), and, second, the result depends on the order of the arguments (which is normally not the case for the aggregate functions). It's not a part of a standard SQL and as for now is implemented only by MySQL with some extra vendor-specific keywords (like ORDER BY within the argument list).

This functionality, however, is often asked for and I have written some articles about implementing this in PostgreSQL and Oracle.

Now, let's see how to do it in SQL Server.

Usually, SQL Server's FOR XML clause is exploited to concatenate the strings. To do this, we obtain a list of group identifiers and for each group, retrieve all it's product names with a subquery appended with FOR XML PATH(''). This makes a single XML column out of the recordset:
Read the rest of this entry »

Written by Quassnoi

June 21st, 2010 at 11:00 pm

Posted in SQL Server