Making an index UNIQUE
From Stack Overflow:
In SQL Server, if I know an index will have unique values, how will it affect performance on
INSERT
's orSELECT
's if I declare it as such?If the optimiser knows the index is
UNIQUE
, how will that affect the query plan?I understand that specifying uniqueness can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences?
Let's see how does a UNIQUE
index perform compared to a non-UNIQUE
one.
Indexes mentioned hereafter are considered to be built on a same intrinsically unique dataset. I will leave consistency issues aside, focusing only on performance details.
A UNIQUE
index, contrary to popular opinion, does not have any performance drawbacks on DML
compared to a plain, non-UNIQUE
index.
When the engine tries to insert a new row into the indexed table, being the index UNIQUE
or not, it performs the following steps:
- Traverse the index tree and decide where to insert the new value
- While traversing, see if the inserted key does match an existing key. Fail on
UNIQUE
index if it does - Insert the new value (see Wikipedia article on B-Tree's for details)
The check mentioned on step 2 is free: traversing an index implies comparing the index keys anyway.
If the index is CLUSTERED
(i. e. the rows themselves are contained in the index leaves, not references to rows), then a special hidden column called uniquifier
is added to the table rows, making the index UNIQUE
in fact.
This make the index key larger and traversing the index tree a bit slower.
That's why a clustered UNIQUE
index is in fact a little bit more efficient than a clustered non-UNIQUE
one on the same dataset.
Finally, SQL Server's optimizer is aware of index UNIQUE
'ness and this imformation can help it make some good optimization decisions.
Let's create a sample table:
This table has 1,000,000 rows with DISTINCT values from 1 to 1,000,000 on columns uval
and ival
.
A UNIQUE
index is built on uval
and a plain index is built on ival
.
Fragmentation information on these indexes shows the following:
DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'IX_indexer_ival') DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'UX_indexer_uval')
LEAF level scan performed. - Pages Scanned................................: 1359 - Extents Scanned..............................: 174 - Extent Switches..............................: 176 - Avg. Pages per Extent........................: 7.8 - Scan Density [Best Count:Actual Count].......: 96.05% [170:177] - Logical Scan Fragmentation ..................: 0.88% - Extent Scan Fragmentation ...................: 98.85% - Avg. Bytes Free per Page.....................: 1.8 - Avg. Page Density (full).....................: 99.98% DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC SHOWCONTIG scanning 't_indexer' table... Table: 't_indexer' (1902629821); index ID: 2, database ID: 8 LEAF level scan performed. - Pages Scanned................................: 1359 - Extents Scanned..............................: 174 - Extent Switches..............................: 175 - Avg. Pages per Extent........................: 7.8 - Scan Density [Best Count:Actual Count].......: 96.59% [170:176] - Logical Scan Fragmentation ..................: 0.74% - Extent Scan Fragmentation ...................: 99.43% - Avg. Bytes Free per Page.....................: 1.8 - Avg. Page Density (full).....................: 99.98% DBCC execution completed. If DBCC printed error messages, contact your system administrator.
As we can see, these indexes are essentially the same.
Now, let's try to find TOP 10
rows ordered by (ival, uval)
:
SET STATISTICS IO ON SET STATISTICS TIME ON SELECT TOP 10 * FROM [20090423_unique].t_indexer ORDER BY ival, uval
id | uval | ival | value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | Value 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 2 | 2 | Value 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 3 | 3 | Value 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 4 | 4 | Value 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 5 | 5 | Value 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 6 | 6 | Value 6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 7 | 7 | Value 7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 8 | 8 | Value 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 9 | 9 | Value 9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 10 | 10 | Value 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0006s (0.3399s) |
Table 't_indexer'. Scan count 3, logical reads 5038, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 657 ms, elapsed time = 340 ms.
This query uses TOP N SORT
, as the table doesn't have an index on (ival, uval)
. To retrieve the ordered valules, it has to sort.
This operation can be parallelized, and it is in fact: note that CPU time
is twise the elapsed time
.
The query still runs for 340 ms.
Now, let's try to sort on (uval, ival)
.
Since we have same values in both columns, the returned resultset should be the same.
SET STATISTICS IO ON SET STATISTICS TIME ON SELECT TOP 10 * FROM [20090423_unique].t_indexer ORDER BY uval, ival
id | uval | ival | value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | Value 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 2 | 2 | Value 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 3 | 3 | Value 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 4 | 4 | Value 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 5 | 5 | Value 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 6 | 6 | Value 6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 7 | 7 | Value 7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 8 | 8 | Value 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 9 | 9 | Value 9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 10 | 10 | Value 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0006s (0.0009s) |
Table 't_indexer'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
This is literally instant.
Why this difference?
SQL Server is smart enough to understand that sorting on a non-NULL
UNIQUE
column produces the stable order regardless of the other columns in sort.
That's why it just discards ival
in the ORDER BY
condition, and sorts only on uval
.
The query plan uses INDEX SCAN
on uval
with a CLUSTERED INDEX SEEK
to retrieve other values and TOP N
to limit the number of rows returned.
Since there is no sort operation and only 10 rows returned, everything is very very fast.
Summary:
Query performance in SQL Server benefits from creating a UNIQUE
index on intrinsically UNIQUE
data.
The query optimizer takes the index UNIQUE
'ness into consideration, correctly estimates the index selectivity as maximal possible and eliminates unneeded sorting.
This was a helpful post, particularly because of the included output from the runtimes and detailed explanation. This is the only post I’ve seen on uniqueness vs non-uniqueness that goes into any depth of the advantages when you have unique data.
ET
27 Dec 13 at 00:16