Constraints and the optimizer in SQL Server: PRIMARY KEY and UNIQUE
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Mitch asks:
I was wondering: do the constraints declared on tables in SQL Server affect the decisions made by the optimizer?
SQL Server allows to define the following constraints on the columns:
PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
DEFAULT
Since these constraints imply some restrictions on data, it gives a theoretical possibility to use these restrictions while optimizing the queries so that some extra checks (which had already been made by the constraints) can be skipped.
SQL Server does use these possibilities and optimizes its plans taking the constraints into account.
In this article we will see how SQL Server uses PRIMARY KEY
and UNIQUE
to optimize the plans.
PRIMARY KEY and UNIQUE
Let's create a pair of sample tables:
Table creation details
CREATE SCHEMA [20091014_constraint] CREATE TABLE t_primary ( id INT NOT NULL, name VARCHAR(30) NOT NULL, CONSTRAINT PK_primary_id PRIMARY KEY (id) ) CREATE TABLE t_foreign ( id INT NOT NULL, pid INT NOT NULL, uval INT NOT NULL, dval INT NOT NULL, name VARCHAR(30) NOT NULL, CONSTRAINT PK_foreign_id PRIMARY KEY (id), CONSTRAINT FK_foreign_pid_primary FOREIGN KEY (pid) REFERENCES t_primary, CONSTRAINT UQ_foreign_uval UNIQUE (uval), CONSTRAINT CK_foreign_uval_dval CHECK (uval = dval) ) GO CREATE INDEX IX_foreign_dval ON [20091014_constraint].t_foreign (dval) GO BEGIN TRANSACTION SELECT RAND(20091014) DECLARE @cnt INT DECLARE @ld INT DECLARE @r INT SET @cnt = 1 WHILE @cnt <= 100000 BEGIN INSERT INTO [20091014_constraint].t_primary VALUES (@cnt, 'Primary ' + CAST(@cnt AS VARCHAR)) SET @ld = (@cnt - 1) * 10 + 1 WHILE @ld <= @cnt * 10 BEGIN SET @r = @ld * 100 + FLOOR(RAND() * 100) INSERT INTO [20091014_constraint].t_foreign VALUES ( @ld, @cnt, @r, @r, 'Foreign ' + CAST(@ld AS VARCHAR) ) SET @ld = @ld + 1 END SET @cnt = @cnt + 1 END COMMIT GO [/sourcecode] </div> <code>t_primary</code> has <strong>100,000</strong> records, <code>t_foreign</code> has <strong>1,000,000</strong> records. There are a <code>PRIMARY KEY</code>, a <code>UNIQUE</code>, a <code>FOREIGN KEY</code> and a <code>CHECK</code> defined on <code>t_foreign</code>. <code>PRIMARY KEY</code> and <code>UNIQUE</code> are very similar in <strong>SQL Server</strong>. There are only two major differences: <ul> <li>There can be only one <code>PRIMARY KEY</code> in a table but multiple <code>UNIQUE</code> constraints are possible</li> <li>A <code>PRIMARY KEY</code> cannot accept <code>NULL</code> values while a <code>UNIQUE</code> column can. Unlike other systems, <code>NULL</code> count as distinct values, there can be only one <code>NULL</code> in a unique column.</li> </ul> <strong>SQL Server</strong>'s optimizer can use these constraints to build more optimal plans. Since the values of the constrained fields are guaranteed to be <code>UNQIUE</code>, there are at least two optimizations that <strong>SQL Server</strong> does: <ul> <li>If there is a <code>UNIQUE</code> column in the <code>SELECT</code> clause, <code>DISTINCT</code> is ignored</li> <li>If there is a sort on a <code>UNIQUE</code> column, all subsequent sort expressions are ignored</li> </ul> <h4>DISTINCT</h4> Let's run two identical queries on <code>uval</code> and <code>dval</code> in <code>t_foreign</code>: SELECT SUM(LEN(name)) AS usum FROM ( SELECT DISTINCT uval, name FROM [20091014_constraint].t_foreign ) q
usum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13888896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0002s (0.3641s) |
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 409, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 407 ms, elapsed time = 359 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END)) |--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007]))) |--Parallelism(Gather Streams) |--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004]))) |--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091014_constraint].[t_foreign].[name]))) |--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
This query uses DISTINCT
on a column marked as UNIQUE
. Since the DISTINCT
is redundant here, it is ignored and the query uses a single Stream Aggregate
to calculate the resulting aggregate function.
This query completes in 0.35 seconds.
Now, let's run the same query on a non-unique field:
SELECT SUM(LEN(name)) AS dsum FROM ( SELECT DISTINCT dval, name FROM [20091014_constraint].t_foreign ) q
dsum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13888896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0002s (4.5311s) |
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 3563 ms, elapsed time = 4516 ms.
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [globalagg1006]=(0) THEN NULL ELSE [globalagg1008] END)) |--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007]))) |--Parallelism(Gather Streams) |--Stream Aggregate(DEFINE:([partialagg1005]=COUNT_BIG([Expr1004]), [partialagg1007]=SUM([Expr1004]))) |--Compute Scalar(DEFINE:([Expr1004]=len([test].[20091014_constraint].[t_foreign].[name]))) |--Hash Match(Aggregate, HASH:([test].[20091014_constraint].[t_foreign].[dval], [test].[20091014_constraint].[t_foreign].[name]), RESIDUAL:([test].[20091014_constraint].[t_foreign].[dval] = [test].[20091014_constraint].[t_foreign].[dval] AND [test].[20091014_constraint].[t_foreign].[name] = [test].[20091014_constraint].[t_foreign].[name])) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[20091014_constraint].[t_foreign].[dval], [test].[20091014_constraint].[t_foreign].[name])) |--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
Now, there is no more UNIQUE
constraint on a column and the engine has to do the DISTINCT
. To do this it uses an additional Hash Match
which of course doesn't add to performance.
The query now runs for 4.5 seconds, or 10 times as slow.
ORDER BY
Again, two identical queries. The first one sorts on uval, dval
, the second one sorts on dval, uval
.
SELECT TOP 10 * FROM [20091014_constraint].t_foreign ORDER BY uval, dval
id | pid | uval | dval | name | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 134 | 134 | Foreign 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 1 | 221 | 221 | Foreign 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 1 | 345 | 345 | Foreign 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 1 | 449 | 449 | Foreign 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 1 | 505 | 505 | Foreign 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 1 | 677 | 677 | Foreign 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 1 | 727 | 727 | Foreign 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 1 | 857 | 857 | Foreign 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 1 | 962 | 962 | Foreign 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 1 | 1073 | 1073 | Foreign 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0006s (0.0007s) |
Table 't_foreign'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
|--Top(TOP EXPRESSION:((10))) |--Nested Loops(Inner Join, OUTER REFERENCES:([test].[20091014_constraint].[t_foreign].[id], [Expr1003]) WITH ORDERED PREFETCH) |--Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[UQ_foreign_uval]), ORDERED FORWARD) |--Clustered Index Seek(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]), SEEK:([test].[20091014_constraint].[t_foreign].[id]=[test].[20091014_constraint].[t_foreign].[id]) LOOKUP ORDERED FORWARD)
Since uval
is unique and declared as such, ordering on dval
is useless and can be reduced to a mere ORDER BY uval
. This is servable by the index on uval
(which is implicitly created for any UNIQUE
constraint), and yields the execution plan with a Nested Loops
to join the index and the table and a Top
to return just first 10 values.
This query is within 1 ms, that is instant.
Now, the second query:
SELECT TOP 10 * FROM [20091014_constraint].t_foreign ORDER BY dval, uval
id | pid | uval | dval | name | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 134 | 134 | Foreign 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 1 | 221 | 221 | Foreign 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 1 | 345 | 345 | Foreign 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 1 | 449 | 449 | Foreign 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 1 | 505 | 505 | Foreign 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 1 | 677 | 677 | Foreign 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 1 | 727 | 727 | Foreign 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 1 | 857 | 857 | Foreign 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 1 | 962 | 962 | Foreign 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 1 | 1073 | 1073 | Foreign 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0006s (0.7031s) |
Table 't_foreign'. Scan count 3, logical reads 7744, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 749 ms, elapsed time = 696 ms.
|--Top(TOP EXPRESSION:((10))) |--Parallelism(Gather Streams, ORDER BY:([test].[20091014_constraint].[t_foreign].[dval] ASC, [test].[20091014_constraint].[t_foreign].[uval] ASC)) |--Sort(TOP 10, ORDER BY:([test].[20091014_constraint].[t_foreign].[dval] ASC, [test].[20091014_constraint].[t_foreign].[uval] ASC)) |--Clustered Index Scan(OBJECT:([test].[20091014_constraint].[t_foreign].[PK_foreign_id]))
Though an index exists on dval
, it does not contain enough information to do all the sorting. SQL Server would need to join this index back to the table, fetch the correponsing value of uval
and sort on both fields.
This is quite an expensive operation, and SQL Server decided to ignore it. Instead, a full scan on the table itself is performed (to fetch the values of both uval
and dval
in one pass).
This query works for 0.7 seconds.
Summary
UNIQUE
constraints help the optimizer to do the following things:
- Ignore
DISTINCT
on aSELECT
clause containing aUNIQUE
column - Ignore the values after the
UNIQUE
column isORDER BY
This makes the corresponding queries more efficient.
To be continued.
Written by Quassnoi
October 14th, 2009 at 11:00 pm
Posted in SQL Server
Leave a Reply
Subscribe
Subscribe by email
Contacts
Should I?
Yes. Feel free to ask questions and write me. An interesting question is a pleasure to answer, and I really enjoy receiving feedback
Recent articles
- Happy New Year: Diffusion Model image generator in about 700 lines of pure SQL
- Happy New Year: GPT in 500 lines of SQL
- Happy New Year: solving the Rubik’s Cube in SQL
- A good first word for Wordle
- Happy New Year: quantum computer emulator in SQL
Calendar
Archives
- December 2024
- December 2023
- December 2022
- January 2022
- December 2021
- December 2020
- December 2019
- December 2018
- December 2017
- December 2016
- December 2015
- December 2014
- July 2014
- December 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- March 2013
- January 2013
- December 2012
- December 2011
- June 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009