Archive for August 20th, 2009
SQL Server: efficient DISTINCT on dates
Comments enabled. I *really* need your comment
From Stack Overflow:
I am wondering if there is a good-performing query to select distinct dates (ignoring times) from a table with a datetime field in SQL Server.
My problem isn't getting the server to actually do this (I've seen this question already, and we had something similar already in place using
DISTINCT
).The problem is whether there is any trick to get it done more quickly.
With the data we are using, our current query is returning ~80 distinct days for which there are ~40,000 rows of data (after filtering on another indexed column), there is an index on the date column, and the query always manages to take 5+ seconds. Which is too slow.
The problem with this query is that no distinct list of truncated dates is easily accessible.
That's why SQL Server has to fetch all the dates from the table (or from the index) and distinctify them somehow.
In Oracle we could probably create a function-based index that would index the truncated part of a DATETIME
column. Then it would just use jump over the distinct index keys.
Same is possible in MySQL (however, the latter does not support function-based indexes, so one has to update the date-only column manually).
SQL Server does support indexes on computed columns, however, its optimizer lacks ability to skip distinct records in an index.
That's why creating an index on a computed column doesn't help.
But since we are talking about the dates, it's possible to emulate this behavior to some extent using a recursive CTE that would generate a list of dates, and the checking each date.
Let's create a sample table: