Archive for June 11th, 2009
Flattening timespans: SQL Server
Comments enabled. I *really* need your comment
From Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
This is also quite a common task.
The algorithm here is simple:
- Order all timespans by
start
- Initialize a variable which will hold a running maximum of the timespans'
stop
- Take the timespans one by one, and for each next timespan, check its
start
against a running maximum of the previousstop
's. If the currentstart
is greater than the maximum of previousstop
's, it means that all timespans that started earlier have been ended, and we have a gap - Update the running maximum
Unfortunately, SQL Server
does not provide a simple way to calculate running maximums. We could use a correlated subquery of course, but it would take forever.
This is one of quite rare cases when cursor in SQL Server
work faster than set-based operations.
Let's create a sample table and see how it works:
Read the rest of this entry »