EXPLAIN EXTENDED

How to create fast database queries

Making an index UNIQUE

with one comment

From Stack Overflow:

In SQL Server, if I know an index will have unique values, how will it affect performance on INSERT's or SELECT's if I declare it as such?

If the optimiser knows the index is UNIQUE, how will that affect the query plan?

I understand that specifying uniqueness can serve to preserve integrity, but leaving that discussion aside for the moment, what are the perfomance consequences?

Let's see how does a UNIQUE index perform compared to a non-UNIQUE one.

Indexes mentioned hereafter are considered to be built on a same intrinsically unique dataset. I will leave consistency issues aside, focusing only on performance details.

A UNIQUE index, contrary to popular opinion, does not have any performance drawbacks on DML compared to a plain, non-UNIQUE index.

When the engine tries to insert a new row into the indexed table, being the index UNIQUE or not, it performs the following steps:

  1. Traverse the index tree and decide where to insert the new value
  2. While traversing, see if the inserted key does match an existing key. Fail on UNIQUE index if it does
  3. Insert the new value (see Wikipedia article on B-Tree's for details)

The check mentioned on step 2 is free: traversing an index implies comparing the index keys anyway.

If the index is CLUSTERED (i. e. the rows themselves are contained in the index leaves, not references to rows), then a special hidden column called uniquifier is added to the table rows, making the index UNIQUE in fact.

This make the index key larger and traversing the index tree a bit slower.

That's why a clustered UNIQUE index is in fact a little bit more efficient than a clustered non-UNIQUE one on the same dataset.

Finally, SQL Server's optimizer is aware of index UNIQUE'ness and this imformation can help it make some good optimization decisions.

Let's create a sample table:

Table creation details

This table has 1,000,000 rows with DISTINCT values from 1 to 1,000,000 on columns uval and ival.

A UNIQUE index is built on uval and a plain index is built on ival.

Fragmentation information on these indexes shows the following:

DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'IX_indexer_ival')
DBCC SHOWCONTIG ('[20090423_unique].t_indexer', 'UX_indexer_uval')
LEAF level scan performed.
- Pages Scanned................................: 1359
- Extents Scanned..............................: 174
- Extent Switches..............................: 176
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 96.05% [170:177]
- Logical Scan Fragmentation ..................: 0.88%
- Extent Scan Fragmentation ...................: 98.85%
- Avg. Bytes Free per Page.....................: 1.8
- Avg. Page Density (full).....................: 99.98%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC SHOWCONTIG scanning 't_indexer' table...
Table: 't_indexer' (1902629821); index ID: 2, database ID: 8
LEAF level scan performed.
- Pages Scanned................................: 1359
- Extents Scanned..............................: 174
- Extent Switches..............................: 175
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 96.59% [170:176]
- Logical Scan Fragmentation ..................: 0.74%
- Extent Scan Fragmentation ...................: 99.43%
- Avg. Bytes Free per Page.....................: 1.8
- Avg. Page Density (full).....................: 99.98%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

As we can see, these indexes are essentially the same.

Now, let's try to find TOP 10 rows ordered by (ival, uval):

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT  TOP 10 *
FROM    [20090423_unique].t_indexer
ORDER BY
ival, uval

id uval ival value
1 1 1 Value 1
2 2 2 Value 2
3 3 3 Value 3
4 4 4 Value 4
5 5 5 Value 5
6 6 6 Value 6
7 7 7 Value 7
8 8 8 Value 8
9 9 9 Value 9
10 10 10 Value 10
10 rows fetched in 0.0006s (0.3399s)
Table 't_indexer'. Scan count 3, logical reads 5038, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 657 ms,  elapsed time = 340 ms. 

This query uses TOP N SORT, as the table doesn't have an index on (ival, uval). To retrieve the ordered valules, it has to sort.

This operation can be parallelized, and it is in fact: note that CPU time is twise the elapsed time.

The query still runs for 340 ms.

Now, let's try to sort on (uval, ival).

Since we have same values in both columns, the returned resultset should be the same.

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT  TOP 10 *
FROM    [20090423_unique].t_indexer
ORDER BY
uval, ival

id uval ival value
1 1 1 Value 1
2 2 2 Value 2
3 3 3 Value 3
4 4 4 Value 4
5 5 5 Value 5
6 6 6 Value 6
7 7 7 Value 7
8 8 8 Value 8
9 9 9 Value 9
10 10 10 Value 10
10 rows fetched in 0.0006s (0.0009s)
Table 't_indexer'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms. 

This is literally instant.

Why this difference?

SQL Server is smart enough to understand that sorting on a non-NULL UNIQUE column produces the stable order regardless of the other columns in sort.

That's why it just discards ival in the ORDER BY condition, and sorts only on uval.

The query plan uses INDEX SCAN on uval with a CLUSTERED INDEX SEEK to retrieve other values and TOP N to limit the number of rows returned.

Since there is no sort operation and only 10 rows returned, everything is very very fast.

Summary:

Query performance in SQL Server benefits from creating a UNIQUE index on intrinsically UNIQUE data.

The query optimizer takes the index UNIQUE'ness into consideration, correctly estimates the index selectivity as maximal possible and eliminates unneeded sorting.

Written by Quassnoi

April 23rd, 2009 at 11:00 pm

Posted in SQL Server

One Response to 'Making an index UNIQUE'

Subscribe to comments with RSS

  1. This was a helpful post, particularly because of the included output from the runtimes and detailed explanation. This is the only post I’ve seen on uniqueness vs non-uniqueness that goes into any depth of the advantages when you have unique data.

    ET

    27 Dec 13 at 00:16

Leave a Reply