EXPLAIN EXTENDED

How to create fast database queries

Archive for February, 2011

Things SQL needs: SERIES()

with 9 comments

Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.

I believe this could be best illustrated with an example:

id source value
1 1 10
2 1 20
3 2 15
4 2 25
5 1 45
6 3 50
7 3 35
8 1 40
9 1 10

The records are ordered by id, and within this order there are continuous series of records which share the same value of source. In the table above, the series are separated by thick lines.

We want to calculate some aggregates across each of the series: MIN, MAX, SUM, AVG, whatever:

source min max sum avg
1 10 20 30 15.00
2 15 25 40 20.00
1 45 45 45 45.00
3 35 50 85 42.50
1 10 40 50 25.00

This can be used for different things. I used that for:

  • Reading sensors from a moving elevator (thus tracking its position)
  • Recording user's activity on a site
  • Tracking the primary node in a server cluster

, but almost any seasoned database developer can recall a need for such a query.

As you can see, the values of source are repeating so a mere GROUP BY won't work here.

In the systems supporting window functions there is a workaround for that:

Read the rest of this entry »

Written by Quassnoi

February 18th, 2011 at 11:00 pm

Posted in Miscellaneous

Late row lookups: InnoDB

with 7 comments

Answering questions asked on the site.

Aryé asks:

Thanks for your article about late row lookups in MySQL.

I have two questions for you please:

  • Is this workaround specific to MyISAM engine?
  • How does PostgreSQL handle this?

The questions concerns a certain workaround for MySQL LIMIT … OFFSET queries like this:

SELECT  *
FROM    mytable
ORDER BY
        id
LIMIT   10
OFFSET  10000

which can be improved using a little rewrite:

SELECT  m.*
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT   10
        OFFSET  10000
        ) q
JOIN    mytable m
ON      m.id = q.id
ORDER BY
        m.id

For the rationale behind this improvement, please read the original article.

Now, to the questions.

The second questions is easy: PostgreSQL won't pull the fields from the table until it really needs them. If a query involving an ORDER BY along with LIMIT and OFFSET is optimized to use the index for the ORDER BY part, the table lookups won't happen for the records skipped.

Though PostgreSQL does not reflect the table lookups in the EXPLAIN output, a simple test would show us that they are done only LIMIT times, not OFFSET + LIMIT (like MySQL does).

Now, let's try to answer the first question: will this trick improve the queries against an InnoDB table?

To do this, we will create a sample table:

Read the rest of this entry »

Written by Quassnoi

February 11th, 2011 at 11:00 pm

Posted in MySQL