EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Happy New Year!. You're welcome to read and comment on it.

Constraints and the optimizer in SQL Server: PRIMARY KEY and UNIQUE

Answering questions asked on the site.

Mitch asks:

I was wondering: do the constraints declared on tables in SQL Server affect the decisions made by the optimizer?

SQL Server allows to define the following constraints on the columns:

  • PRIMARY KEY
  • UNIQUE
  • FOREIGN KEY
  • CHECK
  • DEFAULT

Since these constraints imply some restrictions on data, it gives a theoretical possibility to use these restrictions while optimizing the queries so that some extra checks (which had already been made by the constraints) can be skipped.

SQL Server does use these possibilities and optimizes its plans taking the constraints into account.

In this article we will see how SQL Server uses PRIMARY KEY and UNIQUE to optimize the plans.

PRIMARY KEY and UNIQUE

Let’s create a pair of sample tables:

Table creation details

t_primary has 100,000 records, t_foreign has 1,000,000 records.

There are a PRIMARY KEY, a UNIQUE, a FOREIGN KEY and a CHECK defined on t_foreign.

PRIMARY KEY and UNIQUE are very similar in SQL Server. There are only two major differences:

  • There can be only one PRIMARY KEY in a table but multiple UNIQUE constraints are possible
  • A PRIMARY KEY cannot accept NULL values while a UNIQUE column can. Unlike other systems, NULL count as distinct values, there can be only one NULL in a unique column.

SQL Server‘s optimizer can use these constraints to build more optimal plans. Since the values of the constrained fields are guaranteed to be UNQIUE, there are at least two optimizations that SQL Server does:

  • If there is a UNIQUE column in the SELECT clause, DISTINCT is ignored
  • If there is a sort on a UNIQUE column, all subsequent sort expressions are ignored

DISTINCT

Let’s run two identical queries on uval and dval in t_foreign:

SELECT  SUM(LEN(name)) AS usum
FROM    (
        SELECT  DISTINCT uval, name
        FROM    [20091014_constraint].t_foreign
        ) q

View query details

This query uses DISTINCT on a column marked as UNIQUE. Since the DISTINCT is redundant here, it is ignored and the query uses a single Stream Aggregate to calculate the resulting aggregate function.

This query completes in 0.35 seconds.

Now, let’s run the same query on a non-unique field:

SELECT  SUM(LEN(name)) AS dsum
FROM    (
        SELECT  DISTINCT dval, name
        FROM    [20091014_constraint].t_foreign
        ) q

View query details

Now, there is no more UNIQUE constraint on a column and the engine has to do the DISTINCT. To do this it uses an additional Hash Match which of course doesn’t add to performance.

The query now runs for 4.5 seconds, or 10 times as slow.

ORDER BY

Again, two identical queries. The first one sorts on uval, dval, the second one sorts on dval, uval.

SELECT  TOP 10
        *
FROM    [20091014_constraint].t_foreign
ORDER BY
        uval, dval

View query details

Since uval is unique and declared as such, ordering on dval is useless and can be reduced to a mere ORDER BY uval. This is servable by the index on uval (which is implicitly created for any UNIQUE constraint), and yields the execution plan with a Nested Loops to join the index and the table and a Top to return just first 10 values.

This query is within 1 ms, that is instant.

Now, the second query:

SELECT  TOP 10
        *
FROM    [20091014_constraint].t_foreign
ORDER BY
        dval, uval

View query details

Though an index exists on dval, it does not contain enough information to do all the sorting. SQL Server would need to join this index back to the table, fetch the correponsing value of uval and sort on both fields.

This is quite an expensive operation, and SQL Server decided to ignore it. Instead, a full scan on the table itself is performed (to fetch the values of both uval and dval in one pass).

This query works for 0.7 seconds.

Summary

UNIQUE constraints help the optimizer to do the following things:

  • Ignore DISTINCT on a SELECT clause containing a UNIQUE column
  • Ignore the values after the UNIQUE column is ORDER BY

This makes the corresponding queries more efficient.

To be continued.

Written by Quassnoi

October 14th, 2009 at 11:00 pm

Posted in SQL Server

Comments are closed.