EXPLAIN EXTENDED

How to create fast database queries

Archive for August 19th, 2009

MySQL: difference between sets

with one comment

From Stack Overflow:

I have a table that holds data about items that existed at a certain time (regular snapshots taken).

Simple example:

Timestamp ID
1 A
1 B
2 A
2 B
2 C
3 A
3 D
4 D
4 E

In this case, item C gets created sometime between snapshot 1 and 2, sometime between snapshot 2 and 3 B and C disappear and D gets created, etc.

The table is reasonably large (millions of records) and for each timestamp there are about 50 records.

What's the most efficient way of selecting the item ids for items that disappear between two consecutive timestamps?

So for the above example I would like to get the following:

Previous snapshot Current snapshot Removed
1 2 NULL
2 3 B, C
3 4 A

If it doesn't make the query inefficient, can it be extended to automatically use the latest (i. e. MAX) timestamp and the previous one?

We basically need to do the following things here:

  1. Split the table into sets grouped by timestamp
  2. Compare each set with the one of previous timestamp
  3. Find the values missing in the current set and concatenate them

This is possible to do using only the standard ANSI SQL operators, however, this will be inefficient in MySQL.

Let's create a sample table and see how to work around this:

Read the rest of this entry »

Written by Quassnoi

August 19th, 2009 at 11:00 pm

Posted in MySQL