Archive for February, 2011
Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.
I believe this could be best illustrated with an example:
The records are ordered by
id, and within this order there are continuous series of records which share the same value of
source. In the table above, the series are separated by thick lines.
We want to calculate some aggregates across each of the series:
This can be used for different things. I used that for:
- Reading sensors from a moving elevator (thus tracking its position)
- Recording user’s activity on a site
- Tracking the primary node in a server cluster
, but almost any seasoned database developer can recall a need for such a query.
As you can see, the values of
source are repeating so a mere
GROUP BY won’t work here.
In the systems supporting window functions there is a workaround for that:
Answering questions asked on the site.
Thanks for your article about late row lookups in MySQL.
I have two questions for you please:
- Is this workaround specific to MyISAM engine?
- How does PostgreSQL handle this?
The questions concerns a certain workaround for MySQL
LIMIT … OFFSET queries like this:
SELECT * FROM mytable ORDER BY id LIMIT 10 OFFSET 10000
which can be improved using a little rewrite:
SELECT m.* FROM ( SELECT id FROM mytable ORDER BY id LIMIT 10 OFFSET 10000 ) q JOIN mytable m ON m.id = q.id ORDER BY m.id
For the rationale behind this improvement, please read the original article.
Now, to the questions.
The second questions is easy: PostgreSQL won’t pull the fields from the table until it really needs them. If a query involving an
ORDER BY along with
OFFSET is optimized to use the index for the
ORDER BY part, the table lookups won’t happen for the records skipped.
Though PostgreSQL does not reflect the table lookups in the
EXPLAIN output, a simple test would show us that they are done only
LIMIT times, not
OFFSET + LIMIT (like MySQL does).
Now, let’s try to answer the first question: will this trick improve the queries against an InnoDB table?
To do this, we will create a sample table: