Inverting date ranges at EXPLAIN EXTENDED

In an SQL table I keep bookings for various resouces, with a StartDate / EndDate column:

ResourceID StartDate EndDate

1 2009-01-01 2009-01-05

1 2009-01-07 2009-01-10

2 2009-01-03 2009-01-18

A record means that the given resource is busy.

I need to produce a list of all resources that are available for at least X consecutive days in a given time interval, e. g. from 2009-01-01 to 2009-01-20 and 5 consecutive days.

ResourceID	StartDate	EndDate
1	2009-01-01	2009-01-05
1	2009-01-07	2009-01-10
2	2009-01-03	2009-01-18

This would be quite an easy task if the table records reflected available resources. We could just check that a resource exists using EXISTS and that it has given length using DATEDIFF.

However, the table reflects the busy resources. This means we should invert the date ranges: given a list of consecutive non-overlapping date ranges, we should build a list of their complements, or the gaps between them.

Here's how we do it.

First, we should build an ordered set of the date ranges. Since the ranges are not overlapping, the set can be ordered by any column (StartDate or EndDate), with the result being the same:

Table

Then we should make a copy of this set and shift it one record up:

Tables

Finally, we should form the records, taking the EndDate from the first set and the StartDate from the second set, substituting missing values with NULLs:

Join

The resulting set (outlined with the black lines on the picture above) is the inversion of the original set. The original date ranges became the gaps and the original gaps became the date ranges.

To implement this in SQL we should do the following:

Filter the table so that only the date ranges falling within the given interval are selected.
Assign the row numbers (paritioned by ResourceID and ordered by StartDate) to each row, using ROW_NUMBER
Make a self-join using FULL OUTER JOIN, using a row number shifted by one as a join condition. The first row and the last row of each partition will have no counterpart in the JOIN, and the FULL OUTER JOIN will substitute the missing counterparts with NULLs. This make each partition one record longer, exactly as shown on the picture.
Replace the NULLs with the values of the beginning and the end of the interval.
Finally, select the ranges that are at least 5 days long.

Let's generate a sample rowset:

View query details

WITH    data (ResourceID, StartDate, EndDate) AS
(
SELECT  1, CAST('2008-12-24' AS DATETIME), CAST('2009-12-31' AS DATETIME)
UNION ALL
SELECT  1, '2009-01-01', '2009-01-05'
UNION ALL
SELECT  1, '2009-01-07', '2009-01-10'
UNION ALL
SELECT  1, '2009-01-21', '2009-01-25'
UNION ALL
SELECT  2, '2009-01-03', '2009-01-18'
)
SELECT  *
FROM    data

ResourceID	StartDate	EndDate
1	2008-12-24 00:00:00.000	2009-12-31 00:00:00.000
1	2009-01-01 00:00:00.000	2009-01-05 00:00:00.000
1	2009-01-07 00:00:00.000	2009-01-10 00:00:00.000
1	2009-01-21 00:00:00.000	2009-01-25 00:00:00.000
2	2009-01-03 00:00:00.000	2009-01-18 00:00:00.000
5 rows fetched in 0.0003s (0.0005s)

The next step is to filter the rows and order them partition-wise:

View query details

WITH    data (ResourceID, StartDate, EndDate) AS
(
SELECT  1, CAST('2008-12-24' AS DATETIME), CAST('2009-12-31' AS DATETIME)
UNION ALL
SELECT  1, '2009-01-01', '2009-01-05'
UNION ALL
SELECT  1, '2009-01-07', '2009-01-10'
UNION ALL
SELECT  1, '2009-01-21', '2009-01-25'
UNION ALL
SELECT  2, '2009-01-03', '2009-01-18'
),
rows AS
(
SELECT  ResourceID, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY ResourceID ORDER BY StartDate) AS rn
FROM    data
WHERE   StartDate &gt;= '2009-01-01'
AND EndDate &lt;= DATEADD(day, 1, '2009-01-20')
        )
SELECT  *
FROM    rows

ResourceID	StartDate	EndDate	rn
1	2009-01-01 00:00:00.000	2009-01-05 00:00:00.000	1
1	2009-01-07 00:00:00.000	2009-01-10 00:00:00.000	2
2	2009-01-03 00:00:00.000	2009-01-18 00:00:00.000	1
3 rows fetched in 0.0003s (0.0006s)

To invert the range list, we should make a self-fulljoin:

View query details

WITH    data (ResourceID, StartDate, EndDate) AS
(
SELECT  1, CAST('2008-12-24' AS DATETIME), CAST('2009-12-31' AS DATETIME)
UNION ALL
SELECT  1, '2009-01-01', '2009-01-05'
UNION ALL
SELECT  1, '2009-01-07', '2009-01-10'
UNION ALL
SELECT  1, '2009-01-21', '2009-01-25'
UNION ALL
SELECT  2, '2009-01-03', '2009-01-18'
),
rows AS
(
SELECT  ResourceID, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY ResourceID ORDER BY StartDate) AS rn
FROM    data
WHERE   StartDate &gt;= '2009-01-01'
AND EndDate &lt;= DATEADD(day, 1, '2009-01-20')
        )
SELECT  COALESCE(rl.ResourceID, rr.ResourceID) AS ResourceID,
        rl.rn AS LeftRN,
        CONVERT(CHAR(10), rl.StartDate, 120) AS LeftStartDate,
        CONVERT(CHAR(10), rl.EndDate, 120) AS LeftEndDate,
        CONVERT(CHAR(10), rr.StartDate, 120) AS RightStartDate,
        CONVERT(CHAR(10), rr.EndDate, 120) AS RightEndDate,
        rr.rn AS RightRN
FROM    rows rl
FULL JOIN
        rows rr
ON      rr.ResourceID = rl.ResourceID
        AND rr.rn = rl.rn + 1

ResourceID	LeftRN	LeftStartDate	LeftEndDate	RightStartDate	RightEndDate	RightRN
1				2009-01-01	2009-01-05	1
1	1	2009-01-01	2009-01-05	2009-01-07	2009-01-10	2
1	2	2009-01-07	2009-01-10
2				2009-01-03	2009-01-18	1
2	1	2009-01-03	2009-01-18
5 rows fetched in 0.0005s (0.0176s)

Finally, we should replace the NULL values generated by the FULL JOIN with the beginning and the end of the interval and filter the resulting values so that only the ranges longer than 5 days are returned:

WITH    data (ResourceID, StartDate, EndDate) AS
(
SELECT  1, CAST('2008-12-24' AS DATETIME), CAST('2009-12-31' AS DATETIME)
UNION ALL
SELECT  1, '2009-01-01', '2009-01-05'
UNION ALL
SELECT  1, '2009-01-07', '2009-01-10'
UNION ALL
SELECT  1, '2009-01-21', '2009-01-25'
UNION ALL
SELECT  2, '2009-01-03', '2009-01-18'
),
rows AS
(
SELECT  ResourceID, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY ResourceID ORDER BY StartDate) AS rn
FROM    data
WHERE   StartDate &gt;= '2009-01-01'
AND EndDate &lt;= DATEADD(day, 1, '2009-01-20')
        ),
        invert AS
        (
        SELECT  COALESCE(rl.ResourceID, rr.ResourceID) AS ResourceID,
                COALESCE(rl.EndDate, '2009-01-01') AS StartDate,
                COALESCE(rr.StartDate, '2009-01-20') AS EndDate
        FROM    rows rl
        FULL JOIN
                rows rr
        ON      rr.ResourceID = rl.ResourceID
                AND rr.rn = rl.rn + 1        
        )
SELECT  *
FROM    invert
WHERE   DATEDIFF(day, StartDate, EndDate) &gt;= 5

ResourceID	StartDate	EndDate
1	2009-01-10 00:00:00.000	2009-01-20 00:00:00.000
1 row fetched in 0.0002s (0.0008s)

, which is the query we were after.

Written by Quassnoi

November 9th, 2009 at 11:00 pm

Posted in SQL Server

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

EXPLAIN EXTENDED

Inverting date ranges

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

Inverting date ranges

Share this:

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow