EXPLAIN EXTENDED

How to create fast database queries

Inverting date ranges

Comments enabled. I *really* need your comment

From Stack Overflow:

In an SQL table I keep bookings for various resouces, with a StartDate / EndDate column:

ResourceID StartDate EndDate
1 2009-01-01 2009-01-05
1 2009-01-07 2009-01-10
2 2009-01-03 2009-01-18

A record means that the given resource is busy.

I need to produce a list of all resources that are available for at least X consecutive days in a given time interval, e. g. from 2009-01-01 to 2009-01-20 and 5 consecutive days.

This would be quite an easy task if the table records reflected available resources. We could just check that a resource exists using EXISTS and that it has given length using DATEDIFF.

However, the table reflects the busy resources. This means we should invert the date ranges: given a list of consecutive non-overlapping date ranges, we should build a list of their complements, or the gaps between them.

Here's how we do it.

First, we should build an ordered set of the date ranges. Since the ranges are not overlapping, the set can be ordered by any column (StartDate or EndDate), with the result being the same:

Table

Then we should make a copy of this set and shift it one record up:

Tables

Finally, we should form the records, taking the EndDate from the first set and the StartDate from the second set, substituting missing values with NULLs:

Join

The resulting set (outlined with the black lines on the picture above) is the inversion of the original set. The original date ranges became the gaps and the original gaps became the date ranges.

To implement this in SQL we should do the following:

  1. Filter the table so that only the date ranges falling within the given interval are selected.
  2. Assign the row numbers (paritioned by ResourceID and ordered by StartDate) to each row, using ROW_NUMBER
  3. Make a self-join using FULL OUTER JOIN, using a row number shifted by one as a join condition. The first row and the last row of each partition will have no counterpart in the JOIN, and the FULL OUTER JOIN will substitute the missing counterparts with NULLs. This make each partition one record longer, exactly as shown on the picture.
  4. Replace the NULLs with the values of the beginning and the end of the interval.
  5. Finally, select the ranges that are at least 5 days long.

Let's generate a sample rowset:

View query details

The next step is to filter the rows and order them partition-wise:

View query details

To invert the range list, we should make a self-fulljoin:

View query details

Finally, we should replace the NULL values generated by the FULL JOIN with the beginning and the end of the interval and filter the resulting values so that only the ranges longer than 5 days are returned:

WITH    data (ResourceID, StartDate, EndDate) AS
(
SELECT  1, CAST('2008-12-24' AS DATETIME), CAST('2009-12-31' AS DATETIME)
UNION ALL
SELECT  1, '2009-01-01', '2009-01-05'
UNION ALL
SELECT  1, '2009-01-07', '2009-01-10'
UNION ALL
SELECT  1, '2009-01-21', '2009-01-25'
UNION ALL
SELECT  2, '2009-01-03', '2009-01-18'
),
rows AS
(
SELECT  ResourceID, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY ResourceID ORDER BY StartDate) AS rn
FROM    data
WHERE   StartDate >= '2009-01-01'
AND EndDate <= DATEADD(day, 1, '2009-01-20')
        ),
        invert AS
        (
        SELECT  COALESCE(rl.ResourceID, rr.ResourceID) AS ResourceID,
                COALESCE(rl.EndDate, '2009-01-01') AS StartDate,
                COALESCE(rr.StartDate, '2009-01-20') AS EndDate
        FROM    rows rl
        FULL JOIN
                rows rr
        ON      rr.ResourceID = rl.ResourceID
                AND rr.rn = rl.rn + 1        
        )
SELECT  *
FROM    invert
WHERE   DATEDIFF(day, StartDate, EndDate) >= 5
ResourceID StartDate EndDate
1 2009-01-10 00:00:00.000 2009-01-20 00:00:00.000
1 row fetched in 0.0002s (0.0008s)

, which is the query we were after.

Written by Quassnoi

November 9th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply