SQL Server: performance of UNIQUE index at EXPLAIN EXTENDED

In SQL Server, if I know an index will have unique values, how will it affect performance on INSERT's or SELECT's if I declare it as such?

If the optimiser knows the index is UNIQUE, how will that affect the query plan?

I understand that specifying uniqueness can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences?

Let's see how does a UNIQUE index perform compared to a non-UNIQUE one.

Indexes mentioned hereafter are considered to be built on a same intrinsically unique dataset. I will leave consistency issues aside, focusing only on performance details.

A UNIQUE index, contrary to popular opinion, does not have any performance drawbacks on DML compared to a plain, non-UNIQUE index.

When the engine tries to insert a new row into the indexed table, being the index UNIQUE or not, it performs the following steps:

Traverse the index tree and decide where to insert the new value
While traversing, see if the inserted key does match an existing key. Fail on UNIQUE index if it does
Insert the new value (see Wikipedia article on B-Tree's for details)

The check mentioned on step 2 is free: traversing an index implies comparing the index keys anyway.

If the index is CLUSTERED (i. e. the rows themselves are contained in the index leaves, not references to rows), then a special hidden column called uniquifier is added to the table rows, making the index UNIQUE in fact.

This make the index key larger and traversing the index tree a bit slower.

That's why a clustered UNIQUE index is in fact a little bit more efficient than a clustered non-UNIQUE one on the same dataset.

Finally, SQL Server's optimizer is aware of index UNIQUE'ness and this imformation can help it make some good optimization decisions.

Let's create a sample table:

Table creation details

DROP SCHEMA [20090423_unique]
GO
CREATE SCHEMA [20090423_unique]
CREATE TABLE t_indexer (
id INT NOT NULL PRIMARY KEY,
uval INT NOT NULL,
ival INT NOT NULL,
value VARCHAR(50) NOT NULL
)
GO
CREATE UNIQUE INDEX UX_indexer_uval ON [20090423_unique].t_indexer (uval)
CREATE INDEX IX_indexer_ival ON [20090423_unique].t_indexer (ival)
GO
SET NOCOUNT ON
DECLARE @cnt INT
SET @cnt = 1
BEGIN TRANSACTION
WHILE @cnt &lt; 1000000
BEGIN
        INSERT
        INTO    [20090423_unique].t_indexer
        VALUES  (@cnt, @cnt, @cnt, 'Value ' + CAST(@cnt AS VARCHAR))
        SET @cnt = @cnt + 1
END
COMMIT
GO

This table has 1,000,000 rows with DISTINCT values from 1 to 1,000,000 on columns uval and ival.

A UNIQUE index is built on uval and a plain index is built on ival.

Fragmentation information on these indexes shows the following:

DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'IX_indexer_ival')
DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'UX_indexer_uval')

LEAF level scan performed.
- Pages Scanned................................: 1359
- Extents Scanned..............................: 174
- Extent Switches..............................: 176
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 96.05% [170:177]
- Logical Scan Fragmentation ..................: 0.88%
- Extent Scan Fragmentation ...................: 98.85%
- Avg. Bytes Free per Page.....................: 1.8
- Avg. Page Density (full).....................: 99.98%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC SHOWCONTIG scanning 't_indexer' table...
Table: 't_indexer' (1902629821); index ID: 2, database ID: 8
LEAF level scan performed.
- Pages Scanned................................: 1359
- Extents Scanned..............................: 174
- Extent Switches..............................: 175
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 96.59% [170:176]
- Logical Scan Fragmentation ..................: 0.74%
- Extent Scan Fragmentation ...................: 99.43%
- Avg. Bytes Free per Page.....................: 1.8
- Avg. Page Density (full).....................: 99.98%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

As we can see, these indexes are essentially the same.

Now, let's try to find TOP 10 rows ordered by (ival, uval):

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT  TOP 10 *
FROM    [20090423_unique].t_indexer
ORDER BY
ival, uval

id	uval	ival	value
1	1	1	Value 1
2	2	2	Value 2
3	3	3	Value 3
4	4	4	Value 4
5	5	5	Value 5
6	6	6	Value 6
7	7	7	Value 7
8	8	8	Value 8
9	9	9	Value 9
10	10	10	Value 10
10 rows fetched in 0.0006s (0.3399s)

Table 't_indexer'. Scan count 3, logical reads 5038, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 657 ms,  elapsed time = 340 ms.

This query uses TOP N SORT, as the table doesn't have an index on (ival, uval). To retrieve the ordered valules, it has to sort.

This operation can be parallelized, and it is in fact: note that CPU time is twise the elapsed time.

The query still runs for 340 ms.

Now, let's try to sort on (uval, ival).

Since we have same values in both columns, the returned resultset should be the same.

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT  TOP 10 *
FROM    [20090423_unique].t_indexer
ORDER BY
uval, ival

id	uval	ival	value
1	1	1	Value 1
2	2	2	Value 2
3	3	3	Value 3
4	4	4	Value 4
5	5	5	Value 5
6	6	6	Value 6
7	7	7	Value 7
8	8	8	Value 8
9	9	9	Value 9
10	10	10	Value 10
10 rows fetched in 0.0006s (0.0009s)

Table 't_indexer'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

This is literally instant.

Why this difference?

SQL Server is smart enough to understand that sorting on a non-NULL UNIQUE column produces the stable order regardless of the other columns in sort.

That's why it just discards ival in the ORDER BY condition, and sorts only on uval.

The query plan uses INDEX SCAN on uval with a CLUSTERED INDEX SEEK to retrieve other values and TOP N to limit the number of rows returned.

Since there is no sort operation and only 10 rows returned, everything is very very fast.

Summary:

Query performance in SQL Server benefits from creating a UNIQUE index on intrinsically UNIQUE data.

The query optimizer takes the index UNIQUE'ness into consideration, correctly estimates the index selectivity as maximal possible and eliminates unneeded sorting.

Written by Quassnoi

April 23rd, 2009 at 11:00 pm

Posted in SQL Server

One Response to 'Making an index UNIQUE'

Subscribe to comments with RSS

This was a helpful post, particularly because of the included output from the runtimes and detailed explanation. This is the only post I’ve seen on uniqueness vs non-uniqueness that goes into any depth of the advantages when you have unique data.

ET

27 Dec 13 at 00:16

EXPLAIN EXTENDED

Making an index UNIQUE

One Response to 'Making an index UNIQUE'

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

Making an index UNIQUE

Share this:

One Response to 'Making an index UNIQUE'

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow