Archive for August 19th, 2009
MySQL: difference between sets
From Stack Overflow:
I have a table that holds data about items that existed at a certain time (regular snapshots taken).
Simple example:
Timestamp ID 1 A 1 B 2 A 2 B 2 C 3 A 3 D 4 D 4 E In this case, item
C
gets created sometime between snapshot 1 and 2, sometime between snapshot 2 and 3B
andC
disappear andD
gets created, etc.The table is reasonably large (millions of records) and for each timestamp there are about 50 records.
What's the most efficient way of selecting the item
id
s for items that disappear between two consecutive timestamps?So for the above example I would like to get the following:
Previous snapshot Current snapshot Removed 1 2 NULL 2 3 B, C 3 4 A If it doesn't make the query inefficient, can it be extended to automatically use the latest (i. e.
MAX
) timestamp and the previous one?
We basically need to do the following things here:
- Split the table into sets grouped by timestamp
- Compare each set with the one of previous timestamp
- Find the values missing in the current set and concatenate them
This is possible to do using only the standard ANSI SQL operators, however, this will be inefficient in MySQL
.
Let's create a sample table and see how to work around this: