Constraints and the optimizer in SQL Server: PRIMARY KEY and UNIQUE
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Mitch asks:
I was wondering: do the constraints declared on tables in SQL Server affect the decisions made by the optimizer?
SQL Server allows to define the following constraints on the columns:
PRIMARY KEYUNIQUEFOREIGN KEYCHECKDEFAULT
Since these constraints imply some restrictions on data, it gives a theoretical possibility to use these restrictions while optimizing the queries so that some extra checks (which had already been made by the constraints) can be skipped.
SQL Server does use these possibilities and optimizes its plans taking the constraints into account.
In this article we will see how SQL Server uses PRIMARY KEY and UNIQUE to optimize the plans.
PRIMARY KEY and UNIQUE
Let's create a pair of sample tables:
Table creation details
CREATE SCHEMA [20091014_constraint]
CREATE TABLE t_primary (
id INT NOT NULL,
name VARCHAR(30) NOT NULL,
CONSTRAINT PK_primary_id PRIMARY KEY (id)
)
CREATE TABLE t_foreign (
id INT NOT NULL,
pid INT NOT NULL,
uval INT NOT NULL,
dval INT NOT NULL,
name VARCHAR(30) NOT NULL,
CONSTRAINT PK_foreign_id PRIMARY KEY (id),
CONSTRAINT FK_foreign_pid_primary FOREIGN KEY (pid) REFERENCES t_primary,
CONSTRAINT UQ_foreign_uval UNIQUE (uval),
CONSTRAINT CK_foreign_uval_dval CHECK (uval = dval)
)
GO
CREATE INDEX IX_foreign_dval ON [20091014_constraint].t_foreign (dval)
GO
BEGIN TRANSACTION
SELECT RAND(20091014)
DECLARE @cnt INT
DECLARE @ld INT
DECLARE @r INT
SET @cnt = 1
WHILE @cnt <= 100000
BEGIN
INSERT
INTO [20091014_constraint].t_primary
VALUES (@cnt, 'Primary ' + CAST(@cnt AS VARCHAR))
SET @ld = (@cnt - 1) * 10 + 1
WHILE @ld <= @cnt * 10
BEGIN
SET @r = @ld * 100 + FLOOR(RAND() * 100)
INSERT
INTO [20091014_constraint].t_foreign
VALUES (
@ld,
@cnt,
@r,
@r,
'Foreign ' + CAST(@ld AS VARCHAR)
)
SET @ld = @ld + 1
END
SET @cnt = @cnt + 1
END
COMMIT
GO
[/sourcecode]
</div>
<code>t_primary</code> has <strong>100,000</strong> records, <code>t_foreign</code> has <strong>1,000,000</strong> records.
There are a <code>PRIMARY KEY</code>, a <code>UNIQUE</code>, a <code>FOREIGN KEY</code> and a <code>CHECK</code> defined on <code>t_foreign</code>.
<code>PRIMARY KEY</code> and <code>UNIQUE</code> are very similar in <strong>SQL Server</strong>. There are only two major differences:
<ul>
<li>There can be only one <code>PRIMARY KEY</code> in a table but multiple <code>UNIQUE</code> constraints are possible</li>
<li>A <code>PRIMARY KEY</code> cannot accept <code>NULL</code> values while a <code>UNIQUE</code> column can. Unlike other systems, <code>NULL</code> count as distinct values, there can be only one <code>NULL</code> in a unique column.</li>
</ul>
<strong>SQL Server</strong>'s optimizer can use these constraints to build more optimal plans. Since the values of the constrained fields are guaranteed to be <code>UNQIUE</code>, there are at least two optimizations that <strong>SQL Server</strong> does:
<ul>
<li>If there is a <code>UNIQUE</code> column in the <code>SELECT</code> clause, <code>DISTINCT</code> is ignored</li>
<li>If there is a sort on a <code>UNIQUE</code> column, all subsequent sort expressions are ignored</li>
</ul>
<h4>DISTINCT</h4>
Let's run two identical queries on <code>uval</code> and <code>dval</code> in <code>t_foreign</code>:
SELECT SUM(LEN(name)) AS usum
FROM (
SELECT DISTINCT uval, name
FROM [20091014_constraint].t_foreign
) q
| usum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13888896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0002s (0.3641s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 409, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 407 ms, elapsed time = 359 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
|--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004])))
|--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091014_constraint].[t_foreign].[name])))
|--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
This query uses DISTINCT on a column marked as UNIQUE. Since the DISTINCT is redundant here, it is ignored and the query uses a single Stream Aggregate to calculate the resulting aggregate function.
This query completes in 0.35 seconds.
Now, let's run the same query on a non-unique field:
SELECT SUM(LEN(name)) AS dsum
FROM (
SELECT DISTINCT dval, name
FROM [20091014_constraint].t_foreign
) q
| dsum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13888896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0002s (4.5311s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 3563 ms, elapsed time = 4516 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END))
|--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004])))
|--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091014_constraint].[t_foreign].[name])))
|--Hash Match(Aggregate, HASH:([test].[20091014_constraint].[t_foreign].[dval], [test].[20091014_constraint].[t_foreign].[name]), RESIDUAL:([test].[20091014_constraint].[t_foreign].[dval] = [test].[20091014_constraint].[t_foreign].[dval] AND [test].[20091014_constraint].[t_foreign].[name] = [test].[20091014_constraint].[t_foreign].[name]))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[20091014_constraint].[t_foreign].[dval], [test].[20091014_constraint].[t_foreign].[name]))
|--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
Now, there is no more UNIQUE constraint on a column and the engine has to do the DISTINCT. To do this it uses an additional Hash Match which of course doesn't add to performance.
The query now runs for 4.5 seconds, or 10 times as slow.
ORDER BY
Again, two identical queries. The first one sorts on uval, dval, the second one sorts on dval, uval.
SELECT TOP 10
*
FROM [20091014_constraint].t_foreign
ORDER BY
uval, dval
| id | pid | uval | dval | name | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 134 | 134 | Foreign 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | 1 | 221 | 221 | Foreign 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 1 | 345 | 345 | Foreign 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 1 | 449 | 449 | Foreign 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | 1 | 505 | 505 | Foreign 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | 1 | 677 | 677 | Foreign 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | 1 | 727 | 727 | Foreign 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | 1 | 857 | 857 | Foreign 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | 1 | 962 | 962 | Foreign 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | 1 | 1073 | 1073 | Foreign 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 rows fetched in 0.0006s (0.0007s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 't_foreign'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
|--Top(TOP EXPRESSION:((10)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([test].[20091014_constraint].[t_foreign].[id], [Expr1003]) WITH ORDERED PREFETCH)
|--Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[UQ_foreign_uval]), ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]), SEEK:([test].[20091014_constraint].[t_foreign].[id]=[test].[20091014_constraint].[t_foreign].[id]) LOOKUP ORDERED FORWARD)
Since uval is unique and declared as such, ordering on dval is useless and can be reduced to a mere ORDER BY uval. This is servable by the index on uval (which is implicitly created for any UNIQUE constraint), and yields the execution plan with a Nested Loops to join the index and the table and a Top to return just first 10 values.
This query is within 1 ms, that is instant.
Now, the second query:
SELECT TOP 10
*
FROM [20091014_constraint].t_foreign
ORDER BY
dval, uval
| id | pid | uval | dval | name | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 134 | 134 | Foreign 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | 1 | 221 | 221 | Foreign 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | 1 | 345 | 345 | Foreign 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | 1 | 449 | 449 | Foreign 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | 1 | 505 | 505 | Foreign 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | 1 | 677 | 677 | Foreign 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | 1 | 727 | 727 | Foreign 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | 1 | 857 | 857 | Foreign 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | 1 | 962 | 962 | Foreign 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | 1 | 1073 | 1073 | Foreign 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 rows fetched in 0.0006s (0.7031s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 749 ms, elapsed time = 696 ms.
|--Top(TOP EXPRESSION:((10)))
|--Parallelism(Gather Streams, ORDER BY:([test].[20091014_constraint].[t_foreign].[dval] ASC, [test].[20091014_constraint].[t_foreign].[uval] ASC))
|--Sort(TOP 10, ORDER BY:([test].[20091014_constraint].[t_foreign].[dval] ASC, [test].[20091014_constraint].[t_foreign].[uval] ASC))
|--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
Though an index exists on dval, it does not contain enough information to do all the sorting. SQL Server would need to join this index back to the table, fetch the correponsing value of uval and sort on both fields.
This is quite an expensive operation, and SQL Server decided to ignore it. Instead, a full scan on the table itself is performed (to fetch the values of both uval and dval in one pass).
This query works for 0.7 seconds.
Summary
UNIQUE constraints help the optimizer to do the following things:
- Ignore
DISTINCTon aSELECTclause containing aUNIQUEcolumn - Ignore the values after the
UNIQUEcolumn isORDER BY
This makes the corresponding queries more efficient.
To be continued.
Written by Quassnoi
October 14th, 2009 at 11:00 pm
Posted in SQL Server
Leave a Reply
Subscribe
Subscribe by email
Contacts
Should I?
Yes. Feel free to ask questions and write me. An interesting question is a pleasure to answer, and I really enjoy receiving feedback
Recent articles
- Happy New Year: Diffusion Model image generator in about 700 lines of pure SQL
- Happy New Year: GPT in 500 lines of SQL
- Happy New Year: solving the Rubik’s Cube in SQL
- A good first word for Wordle
- Happy New Year: quantum computer emulator in SQL
Calendar
Archives
- December 2024
- December 2023
- December 2022
- January 2022
- December 2021
- December 2020
- December 2019
- December 2018
- December 2017
- December 2016
- December 2015
- December 2014
- July 2014
- December 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- March 2013
- January 2013
- December 2012
- December 2011
- June 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
Subscribe in a reader