EXPLAIN EXTENDED

How to create fast database queries

Archive for June 13th, 2009

Flattening timespans: MySQL

Comments enabled. I *really* need your comment

In the previous articles:

, I described how to merge intersecting timespans into several contiguous ranges.

In SQL Server, this task required a table-valued function (TVF) and a cursor.

In Oracle, we had to employ window functions to do this.

Now let's see how to solve this task in MySQL.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000
2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000
2009.06.03 10:13:12:000 11:14:56:000
2009.06.03 10:27:25:000 10:27:31:000
2009.06.03 10:27:39:000 13:53:38:000
2009.06.03 11:14:56:000 11:15:03:000
2009.06.03 11:15:30:000 14:02:14:000
2009.06.03 13:53:38:000 13:53:43:000
2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

MySQL offers easy use of session variables. These are loosely typed variables defined within the session scope which you can assign and evaluate right inside the query.

Using these variables may come in handy for tasks like this.

In MySQL, just like in previous solutions, we will need to order the timespans by start anc calculate the running maximum of the stop.

This is easily done using session variable @edate which we will use to store running maximum. We will update in in the SELECT clause.

We will also declare another variable @r, initialize it with a zero and use it to keep a range number.

If we have a timespan whose start exceeds the previous value of the running maximum then all timespans that had started eariler had also been stopped by this time, and we have a gap.

The current timespan, therefore, belongs to another range, and we need to increment the value of @r.

Finally, we will just group the timespans on the range number and return MIN(start) and MAX(stop) for all timespans in a range. Easy to see that these will be the start and the stop for the whole range.

Here is the sample table:
Read the rest of this entry »

Written by Quassnoi

June 13th, 2009 at 11:00 pm

Posted in MySQL