Date range with possible NULL values at EXPLAIN EXTENDED

Using SQL Server 2008.

I have a stored proc which has start and end date as input parameters for date range.

Looking for a single SQL query which has a BETWEEN predicate on start and end date in the WHERE clause which can handle both cases where the dates are either both NULL or both have values.

This can be done by writing an IS NULL check in the expression, like this:

WHERE   (@startDate IS NULL OR date &gt;= @startDate)
AND (@endDate IS NULL OR date &gt;= @endDate)

or by using a COALESCE on the value of the date itself (so that the range always matches):

WHERE   date BETWEEN ISNULL(@startDate, date) AND ISNULL(@endDate, date)

However, none of these conditions uses an index on date efficiently.

Let's create a sample table ans see:

Table creation details

CREATE SCHEMA [20091005_range]
CREATE TABLE [20091005_range].t_events
(
id INT NOT NULL PRIMARY KEY,
date DATETIME NOT NULL,
value VARCHAR(20) NOT NULL,
stuffing VARCHAR(200) NOT NULL
)
GO
CREATE INDEX IX_events_date ON [20091005_range].t_events (date)
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt &lt;= 1000000
BEGIN
        INSERT
        INTO    [20091005_range].t_events (id, date, value, stuffing)
        VALUES  (
                @cnt,
                DATEADD(minute, -@cnt, CONVERT(DATETIME, '20091005', 112)),
                'Event ' + CAST(@cnt AS VARCHAR),
                REPLICATE('*', 200)
                )
        SET @cnt = @cnt + 1
END;
COMMIT

This table has 1,000,000 records stuffed with 200 bytes of data each.

IS NULL OR equal

DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL
SELECT  SUM(LEN(stuffing))
FROM    [20091005_range].t_events
WHERE   (@startDate IS NULL OR date &gt;= @startDate)
AND (@endDate IS NULL OR date &gt;= @endDate)

View query details


1152000
1 row fetched in 0.0001s (5.6717s)

Table 't_events'. Scan count 3, logical reads 34424, physical reads 76, read-ahead reads 26830, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 313 ms,  elapsed time = 5662 ms.

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END))
       |--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
            |--Parallelism(Gather Streams)
                 |--Stream Aggregate(DEFINE:([partialagg1007]=COUNT_BIG([Expr1006]), [partialagg1009]=SUM([Expr1006])))
                      |--Compute Scalar(DEFINE:([Expr1006]=len([test].[20091005_range].[t_events].[stuffing])))
                           |--Clustered Index Scan(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), WHERE:(([@startDate] IS NULL OR [test].[20091005_range].[t_events].[date]>=[@startDate]) AND ([@endDate] IS NULL OR [test].[20091005_range].[t_events].[date]>=[@endDate])))

This query uses a Clustered Index Scan on the whole table. The records are quite large, that's why it takes much time to scan them all, and the query takes 5.6 seconds to complete.

ISNULL

DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL
SELECT  SUM(LEN(stuffing)) AS total
FROM    [20091005_range].t_events
WHERE   date BETWEEN ISNULL(@startDate, date) AND ISNULL(@endDate, date)

View query details

total
1152000
1 row fetched in 0.0001s (4.8905s)

Table 't_events'. Scan count 3, logical reads 34424, physical reads 59, read-ahead reads 24375, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 328 ms,  elapsed time = 4887 ms.

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
       |--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
            |--Parallelism(Gather Streams)
                 |--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004])))
                      |--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091005_range].[t_events].[stuffing])))
                           |--Clustered Index Scan(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), WHERE:([test].[20091005_range].[t_events].[date]>=isnull([@startDate],[test].[20091005_range].[t_events].[date]) AND [test].[20091005_range].[t_events].[date]<=isnull([@endDate],[test].[20091005_range].[t_events].[date])))

This is just like the previous query: a very expensive Clustered Index Scan.

MIN / MAX

Both conditions above are not sargable. SQL Server cannot define a single range to use it with an index.

To make the condition sargable we could possibly use some predefined constants and use them instead of the value in ISNULL or COALESCE.

But we better use the actual minimal and maximal values of date for this purpose.

MIN and MAX are almost free on an indexed field and we can easily use them to show the actual values of the range bounds or even returning them in the OUTPUT parameters of the stored proc.

Here's the query to do this:

DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL

;
WITH    limits AS
(
SELECT  COALESCE(@startDate, MIN(date)) AS startDate,
COALESCE(@endDate, MAX(date)) AS endDate
FROM    [20091005_range].t_events
)
SELECT  startDate, endDate, total
FROM    limits
CROSS APPLY
(
SELECT  SUM(LEN(stuffing)) AS total
FROM    [20091005_range].t_events
WHERE   date BETWEEN startDate and endDate
) totals

View query details

startDate	endDate	total
2009-10-01 00:00:00.000	2009-10-04 23:59:00.000	1152000
1 row fetched in 0.0002s (0.0532s)

Table 't_events'. Scan count 3, logical reads 17679, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 53 ms.

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006]))
       |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [@startDate] IS NOT NULL THEN [@startDate] ELSE [Expr1003] END, [Expr1006]=CASE WHEN [@endDate] IS NOT NULL THEN [@endDate] ELSE [Expr1004] END))
       |    |--Stream Aggregate(DEFINE:([Expr1003]=MIN([test].[20091005_range].[t_events].[date]), [Expr1004]=MAX([test].[20091005_range].[t_events].[date])))
       |         |--Nested Loops(Inner Join)
       |              |--Top(TOP EXPRESSION:((1)))
       |              |    |--Index Scan(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), ORDERED FORWARD)
       |              |--Top(TOP EXPRESSION:((1)))
       |                   |--Index Scan(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), ORDERED BACKWARD)
       |--Compute Scalar(DEFINE:([Expr1010]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))
            |--Stream Aggregate(DEFINE:([Expr1020]=COUNT_BIG([Expr1011]), [Expr1021]=SUM([Expr1011])))
                 |--Compute Scalar(DEFINE:([Expr1011]=len([test].[20091005_range].[t_events].[stuffing])))
                      |--Nested Loops(Inner Join, OUTER REFERENCES:([test].[20091005_range].[t_events].[id], [Expr1019]) WITH UNORDERED PREFETCH)
                           |--Sort(ORDER BY:([test].[20091005_range].[t_events].[id] ASC))
                           |    |--Index Seek(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), SEEK:([test].[20091005_range].[t_events].[date] >= [Expr1005] AND [test].[20091005_range].[t_events].[date] <= [Expr1006]) ORDERED FORWARD)
                           |--Clustered Index Seek(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), SEEK:([test].[20091005_range].[t_events].[id]=[test].[20091005_range].[t_events].[id]) LOOKUP ORDERED FORWARD)

We see not only the result, but the actual values of startDate and endDate that were substituted.

Note that the index range is additionally sorted by id. Since index records are much smaller in size than the rows, it's worth sorting them by id to make the row lookups sequential (LOOKUP ORDERED FORWARD). This saves much time on traversing the B-Tree that stores the clustered records.

This query takes only 50 ms which is 100 times as fast.

Written by Quassnoi

October 5th, 2009 at 11:00 pm

Posted in SQL Server

2 Responses to 'Date range with possible NULL values'

Subscribe to comments with RSS

i need query for either both values are null or one may be null or both dates have values

d.karthick

2 Jan 15 at 15:48
@d.karthick: have you read the article?

Quassnoi

4 Jan 15 at 04:31

EXPLAIN EXTENDED