EXPLAIN EXTENDED

How to create fast database queries

MySQL: difference between sets

with one comment

From Stack Overflow:

I have a table that holds data about items that existed at a certain time (regular snapshots taken).

Simple example:

Timestamp ID
1 A
1 B
2 A
2 B
2 C
3 A
3 D
4 D
4 E

In this case, item C gets created sometime between snapshot 1 and 2, sometime between snapshot 2 and 3 B and C disappear and D gets created, etc.

The table is reasonably large (millions of records) and for each timestamp there are about 50 records.

What's the most efficient way of selecting the item ids for items that disappear between two consecutive timestamps?

So for the above example I would like to get the following:

Previous snapshot Current snapshot Removed
1 2 NULL
2 3 B, C
3 4 A

If it doesn't make the query inefficient, can it be extended to automatically use the latest (i. e. MAX) timestamp and the previous one?

We basically need to do the following things here:

  1. Split the table into sets grouped by timestamp
  2. Compare each set with the one of previous timestamp
  3. Find the values missing in the current set and concatenate them

This is possible to do using only the standard ANSI SQL operators, however, this will be inefficient in MySQL.

Let's create a sample table and see how to work around this:


Table creation details

Written by Quassnoi

August 19th, 2009 at 11:00 pm

Posted in MySQL

One Response to 'MySQL: difference between sets'

Subscribe to comments with RSS

  1. Thank you, very usefull article.
    I’m sorry but I don’t see any sense for the subquery (recs. 06-13)
    ( SELECT sp.ts
    FROM t_set sp
    WHERE sp.ts = so.pts
    ORDER BY
    sp.ts DESC
    LIMIT 1 )
    I replaced it with the construction “si.ts = so.pts” with the same output result. But may be I just not got the idea ?

    Max

    5 May 17 at 17:03

Leave a Reply