Archive for June 13th, 2009
Flattening timespans: MySQL
Comments enabled. I *really* need your comment
In the previous articles:
, I described how to merge intersecting timespans into several contiguous ranges.
In SQL Server, this task required a table-valued function (TVF) and a cursor.
In Oracle, we had to employ window functions to do this.
Now let's see how to solve this task in MySQL.
A quick reminder of the problem, taken from Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
MySQL offers easy use of session variables. These are loosely typed variables defined within the session scope which you can assign and evaluate right inside the query.
Using these variables may come in handy for tasks like this.
In MySQL, just like in previous solutions, we will need to order the timespans by start
anc calculate the running maximum of the stop
.
This is easily done using session variable @edate
which we will use to store running maximum. We will update in in the SELECT
clause.
We will also declare another variable @r
, initialize it with a zero and use it to keep a range number
.
If we have a timespan whose start
exceeds the previous value of the running maximum then all timespans that had started eariler had also been stopped by this time, and we have a gap.
The current timespan, therefore, belongs to another range, and we need to increment the value of @r
.
Finally, we will just group the timespans on the range number and return MIN(start)
and MAX(stop)
for all timespans in a range. Easy to see that these will be the start
and the stop
for the whole range.
Here is the sample table:
Read the rest of this entry »