Flattening timespans: SQL Server
Comments enabled. I *really* need your comment
From Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
This is also quite a common task.
The algorithm here is simple:
- Order all timespans by
start - Initialize a variable which will hold a running maximum of the timespans'
stop - Take the timespans one by one, and for each next timespan, check its
startagainst a running maximum of the previousstop's. If the currentstartis greater than the maximum of previousstop's, it means that all timespans that started earlier have been ended, and we have a gap - Update the running maximum
Unfortunately, SQL Server does not provide a simple way to calculate running maximums. We could use a correlated subquery of course, but it would take forever.
This is one of quite rare cases when cursor in SQL Server work faster than set-based operations.
Let's create a sample table and see how it works:
CREATE SCHEMA [20090611_timespans]
CREATE TABLE t_span (
s_date DATETIME NOT NULL,
s_start DATETIME NOT NULL,
s_stop DATETIME NOT NULL
)
GO
CREATE INDEX ix_span_date ON [20090611_timespans].t_span (s_date)
GO
DECLARE @day INT
DECLARE @cnt FLOAT
SELECT RAND(20090611)
BEGIN TRANSACTION
SET @day = 499
WHILE @day >= 0
BEGIN
SET @cnt = 0
WHILE @cnt < 2880
BEGIN
INSERT
INTO [20090611_timespans].t_span
VALUES (
CAST('2009-06-11' AS DATETIME) - @day,
CAST('00:00:00' AS DATETIME) + @cnt / 2880,
CAST('00:00:00' AS DATETIME) + (@cnt + (RAND() * 10) + 0.5) / 2880
)
SET @cnt = @cnt + 1
END
SET @day = @day - 1
END
COMMIT
GO
This table has 1,440,000 rows, with timespans starting each 30 seconds and having a random length from 15 to 315 seconds.
Dates and times are stored separately, and only the s_date field is indexed. It's not the best design possible, but since the author mentioned it's a legacy table, there's not much that can be done with it.
Now, here's the function:
CREATE FUNCTION [20090611_timespans].fn_spans(@p_from DATETIME, @p_till DATETIME)
RETURNS @t TABLE
(
q_start DATETIME NOT NULL,
q_end DATETIME NOT NULL
)
AS
BEGIN
DECLARE @qs DATETIME
DECLARE @qe DATETIME
DECLARE @ms DATETIME
DECLARE @me DATETIME
DECLARE cr_span CURSOR FAST_FORWARD
FOR
SELECT s_date + s_start AS q_start,
s_date + s_stop AS q_end
FROM [20090611_timespans].t_span
WHERE s_date BETWEEN @p_from - 1 AND @p_till
AND s_date + s_start >= @p_from
AND s_date + s_stop <= @p_till
ORDER BY
q_start
OPEN cr_span
FETCH NEXT
FROM cr_span
INTO @qs, @qe
SET @ms = @qs
SET @me = @qe
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT
FROM cr_span
INTO @qs, @qe
IF @qs > @me
BEGIN
INSERT
INTO @t
VALUES (@ms, @me)
SET @ms = @qs
END
SET @me = CASE WHEN @qe > @me THEN @qe ELSE @me END
END
IF @ms IS NOT NULL
BEGIN
INSERT
INTO @t
VALUES (@ms, @me)
END
CLOSE cr_span
RETURN
END
GO
This is quite straightforward. We use the variable @me to store the running maximum and an additional variable @ms to store the beginning of the current contiguous timespan. This will save us a join.
Notice the extra condition in a cursor query: WHERE s_date BETWEEN @p_from - 1 AND @p_till
This may seem to be redundant, but actually this is a coarse filter for the index on s_date to be usable. Additional conditions in WHERE will fine filter the rows.
Here's the result of the query:
SELECT *
FROM [20090611_timespans].fn_spans('2007-01-01', '2011-01-01')
| q_start | q_end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008-01-29 00:00:00.000 | 2008-01-30 09:24:28.333 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-01-30 09:24:30.000 | 2008-02-05 16:51:48.270 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-05 16:52:00.000 | 2008-02-09 22:01:57.027 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-09 22:02:00.000 | 2008-02-11 16:07:20.913 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-11 16:07:30.000 | 2008-02-16 11:43:58.093 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-16 11:44:00.000 | 2008-02-17 18:39:52.677 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-17 18:40:00.000 | 2008-02-28 00:53:57.173 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-02-28 00:54:00.000 | 2008-03-03 17:34:59.030 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-03-03 17:35:00.000 | 2008-03-04 23:35:50.357 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-03-04 23:36:00.000 | 2008-03-10 10:27:25.783 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2009-05-30 04:58:00.000 | 2009-06-01 00:31:51.777 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2009-06-01 00:32:00.000 | 2009-06-08 22:19:57.493 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2009-06-08 22:20:00.000 | 2009-06-12 00:03:03.817 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 99 rows fetched in 0.0030s (24.4530s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The query completes in 25 seconds, which is quite an acceptable time for a million and a half rows.
And SQL Server cursors don't seem so terrible after all, being properly used.
Let's try to run this query on a shorter range, like a day, which according to the author's comment is the most popular query condition:
SELECT *
FROM [20090611_timespans].fn_spans('2008-01-30', '2008-01-31')
| q_start | q_end | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008-01-30 00:00:00.000 | 2008-01-30 09:24:28.333 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2008-01-30 09:24:30.000 | 2008-01-30 23:58:49.763 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 rows fetched in 0.0002s (0.3208s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is only 0.32 seconds which is next to instant.
Subscribe in a reader