EXPLAIN EXTENDED

How to create fast database queries

Archive for August 20th, 2009

SQL Server: efficient DISTINCT on dates

Comments enabled. I *really* need your comment

From Stack Overflow:

I am wondering if there is a good-performing query to select distinct dates (ignoring times) from a table with a datetime field in SQL Server.

My problem isn't getting the server to actually do this (I've seen this question already, and we had something similar already in place using DISTINCT).

The problem is whether there is any trick to get it done more quickly.

With the data we are using, our current query is returning ~80 distinct days for which there are ~40,000 rows of data (after filtering on another indexed column), there is an index on the date column, and the query always manages to take 5+ seconds. Which is too slow.

The problem with this query is that no distinct list of truncated dates is easily accessible.

That's why SQL Server has to fetch all the dates from the table (or from the index) and distinctify them somehow.

In Oracle we could probably create a function-based index that would index the truncated part of a DATETIME column. Then it would just use jump over the distinct index keys.

Same is possible in MySQL (however, the latter does not support function-based indexes, so one has to update the date-only column manually).

SQL Server does support indexes on computed columns, however, its optimizer lacks ability to skip distinct records in an index.

That's why creating an index on a computed column doesn't help.

But since we are talking about the dates, it's possible to emulate this behavior to some extent using a recursive CTE that would generate a list of dates, and the checking each date.

Let's create a sample table:

Read the rest of this entry »

Written by Quassnoi

August 20th, 2009 at 11:00 pm

Posted in SQL Server