From Stack Overflow:
Using SQL Server 2008.
I have a stored proc which has start and end date as input parameters for date range.
Looking for a single SQL query which has a BETWEEN
predicate on start and end date in the WHERE
clause which can handle both cases where the dates are either both NULL
or both have values.
This can be done by writing an IS NULL
check in the expression, like this:
WHERE (@startDate IS NULL OR date >= @startDate)
AND (@endDate IS NULL OR date >= @endDate)
or by using a COALESCE
on the value of the date itself (so that the range always matches):
WHERE date BETWEEN ISNULL(@startDate, date) AND ISNULL(@endDate, date)
However, none of these conditions uses an index on date
efficiently.
Let's create a sample table ans see:
Table creation details
CREATE SCHEMA [20091005_range]
CREATE TABLE [20091005_range].t_events
(
id INT NOT NULL PRIMARY KEY,
date DATETIME NOT NULL,
value VARCHAR(20) NOT NULL,
stuffing VARCHAR(200) NOT NULL
)
GO
CREATE INDEX IX_events_date ON [20091005_range].t_events (date)
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 1000000
BEGIN
INSERT
INTO [20091005_range].t_events (id, date, value, stuffing)
VALUES (
@cnt,
DATEADD(minute, -@cnt, CONVERT(DATETIME, '20091005', 112)),
'Event ' + CAST(@cnt AS VARCHAR),
REPLICATE('*', 200)
)
SET @cnt = @cnt + 1
END;
COMMIT
This table has 1,000,000 records stuffed with 200 bytes of data each.
IS NULL OR equal
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL
SELECT SUM(LEN(stuffing))
FROM [20091005_range].t_events
WHERE (@startDate IS NULL OR date >= @startDate)
AND (@endDate IS NULL OR date >= @endDate)
View query details
|
1152000 |
1 row fetched in 0.0001s (5.6717s) |
Table 't_events'. Scan count 3, logical reads 34424, physical reads 76, read-ahead reads 26830, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 313 ms, elapsed time = 5662 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END))
|--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1007]=COUNT_BIG([Expr1006]), [partialagg1009]=SUM([Expr1006])))
|--Compute Scalar(DEFINE:([Expr1006]=len([test].[20091005_range].[t_events].[stuffing])))
|--Clustered Index Scan(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), WHERE:(([@startDate] IS NULL OR [test].[20091005_range].[t_events].[date]>=[@startDate]) AND ([@endDate] IS NULL OR [test].[20091005_range].[t_events].[date]>=[@endDate])))
This query uses a Clustered Index Scan on the whole table. The records are quite large, that's why it takes much time to scan them all, and the query takes 5.6 seconds to complete.
ISNULL
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL
SELECT SUM(LEN(stuffing)) AS total
FROM [20091005_range].t_events
WHERE date BETWEEN ISNULL(@startDate, date) AND ISNULL(@endDate, date)
View query details
total |
1152000 |
1 row fetched in 0.0001s (4.8905s) |
Table 't_events'. Scan count 3, logical reads 34424, physical reads 59, read-ahead reads 24375, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 4887 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
|--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004])))
|--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091005_range].[t_events].[stuffing])))
|--Clustered Index Scan(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), WHERE:([test].[20091005_range].[t_events].[date]>=isnull([@startDate],[test].[20091005_range].[t_events].[date]) AND [test].[20091005_range].[t_events].[date]<=isnull([@endDate],[test].[20091005_range].[t_events].[date])))
This is just like the previous query: a very expensive Clustered Index Scan.
MIN / MAX
Both conditions above are not sargable. SQL Server cannot define a single range to use it with an index.
To make the condition sargable we could possibly use some predefined constants and use them instead of the value in ISNULL
or COALESCE
.
But we better use the actual minimal and maximal values of date
for this purpose.
MIN
and MAX
are almost free on an indexed field and we can easily use them to show the actual values of the range bounds or even returning them in the OUTPUT
parameters of the stored proc.
Here's the query to do this:
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
SET @startDate = '20091001'
SET @endDate = NULL
;
WITH limits AS
(
SELECT COALESCE(@startDate, MIN(date)) AS startDate,
COALESCE(@endDate, MAX(date)) AS endDate
FROM [20091005_range].t_events
)
SELECT startDate, endDate, total
FROM limits
CROSS APPLY
(
SELECT SUM(LEN(stuffing)) AS total
FROM [20091005_range].t_events
WHERE date BETWEEN startDate and endDate
) totals
View query details
startDate |
endDate |
total |
2009-10-01 00:00:00.000 |
2009-10-04 23:59:00.000 |
1152000 |
1 row fetched in 0.0002s (0.0532s) |
Table 't_events'. Scan count 3, logical reads 17679, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 53 ms.
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006]))
|--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [@startDate] IS NOT NULL THEN [@startDate] ELSE [Expr1003] END, [Expr1006]=CASE WHEN [@endDate] IS NOT NULL THEN [@endDate] ELSE [Expr1004] END))
| |--Stream Aggregate(DEFINE:([Expr1003]=MIN([test].[20091005_range].[t_events].[date]), [Expr1004]=MAX([test].[20091005_range].[t_events].[date])))
| |--Nested Loops(Inner Join)
| |--Top(TOP EXPRESSION:((1)))
| | |--Index Scan(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), ORDERED FORWARD)
| |--Top(TOP EXPRESSION:((1)))
| |--Index Scan(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), ORDERED BACKWARD)
|--Compute Scalar(DEFINE:([Expr1010]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))
|--Stream Aggregate(DEFINE:([Expr1020]=COUNT_BIG([Expr1011]), [Expr1021]=SUM([Expr1011])))
|--Compute Scalar(DEFINE:([Expr1011]=len([test].[20091005_range].[t_events].[stuffing])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([test].[20091005_range].[t_events].[id], [Expr1019]) WITH UNORDERED PREFETCH)
|--Sort(ORDER BY:([test].[20091005_range].[t_events].[id] ASC))
| |--Index Seek(OBJECT:([test].[20091005_range].[t_events].[IX_events_date]), SEEK:([test].[20091005_range].[t_events].[date] >= [Expr1005] AND [test].[20091005_range].[t_events].[date] <= [Expr1006]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20091005_range].[t_events].[PK__t_events__71FCD09A]), SEEK:([test].[20091005_range].[t_events].[id]=[test].[20091005_range].[t_events].[id]) LOOKUP ORDERED FORWARD)
We see not only the result, but the actual values of startDate
and endDate
that were substituted.
Note that the index range is additionally sorted by id
. Since index records are much smaller in size than the rows, it's worth sorting them by id
to make the row lookups sequential (LOOKUP ORDERED FORWARD
). This saves much time on traversing the B-Tree that stores the clustered records.
This query takes only 50 ms which is 100 times as fast.
i need query for either both values are null or one may be null or both dates have values
d.karthick
2 Jan 15 at 15:48
@d.karthick: have you read the article?
Quassnoi
4 Jan 15 at 04:31