EXPLAIN EXTENDED

How to create fast database queries

SQL Server 2005: finding foreign keys that have no index

Comments enabled. I *really* need your comment

From Stack Overflow:

How can I programmatically determine which foreign key are missing indexes?

Concise. Let's try to answer it.

A foreign key involves two tables: a referencing one and a referenced one.

In most databases, some kind of a unique key is required on the referenced table (this may be a PRIMARY KEY or just a UNIQUE constraint). This is by design (when we delete a UNIQUE value, we always delete last instance of this value).

However, indexing the FOREIGN KEY fields in the referencing table is not required. Database can function even without this index. However, this can make deletes from the referenced table slow (to validate the relationship we should perform a full table lookup on the referenced table). It almost always makes sense to index the FOREIGN KEY fields.

For an index on the FOREIGN KEY fields to be of use, its leading key part should consist of all columns that comprise the FOREIGN KEY, in any order. Since FOREIGN KEY checks are always equijoins, not range joins, the order of columns does not matter as long as they are leading in the key.

Consider a constraint: FOREIGN KEY (col1, col2) REFERENCES othertable. Which indexes are good for this foreign key?

Index Is good?
col1 No. Though it's better than nothing, only the part of the foreign key can be used in the index range.
col1, col2 Yes. It's a classical case
col2, col1 Yes. The order does not matter, since we are searching for exact values of both col1 and col2
col2, col1, col3 Yes. Despite an extra column, this index can be used to filter on col1 and col2
col3, col1, col2 No. Though both col1 and col2 are indexed, they are not in the leftmost part of the index

So for each foreign key, we should check the following condition: there exists at least one index whose leading part consists of (and only of) all columns that constitute the key.

In SQL Server 2005, the information about indexed columns is stored in sys.index_column. Since indexes can be composite, this is a one-to-many relationship, where index_id and column_id define the index and the column indexed with it, while index_column_id defines column's ordinal position in the index.

For an index to satisfy the foreign key, two conditions should be fulfilled:

  1. The number of columns in the index that are also a part of the foreign key should be equal to the maximal index_column_id. This ensures that all columns constitute the leftmost part of the index.
  2. The number of columns in the index that are also a part of the foreign key should be equal to the number of columns in the foreign key. This ensures that all foreign key columns are indexed.

And here's the query:

SELECT  fk.name, s.name, o.name,
FROM    sys.foreign_keys fk
JOIN    sys.objects o
ON      o.object_id = fk.parent_object_id
JOIN    sys.schemas s
ON      s.schema_id = o.schema_id
WHERE   EXISTS
(
SELECT  *
FROM    sys.index_columns ic
WHERE   EXISTS
(
SELECT  NULL
FROM    sys.foreign_key_columns fkc
WHERE   fkc.constraint_object_id = fk.object_id
AND fkc.parent_object_id = ic.object_id
AND fkc.parent_column_id = ic.column_id
)
GROUP BY
index_id
HAVING  COUNT(*) = MAX(index_column_id)
AND COUNT(*) =
(
SELECT  COUNT(*)
FROM    sys.foreign_key_columns fkc
WHERE   fkc.constraint_object_id = fk.object_id
)
)

Now, let's create some sample tables and check this query:


Table creation details

This script creates a schema with two tables, t_primary and t_foreign. t_primary has a composite PRIMARY KEY, t_foreign has 3 references to that key from different columns.

The first reference is indexed properly (col1, col2), the second one does not consititute the leftmost part of its index (col3, col1, col2), the third one is not indexed on all columns (col1).

Let's run the query:

SELECT  fk.name, s.name, o.name
FROM    sys.foreign_keys fk
JOIN    sys.objects o
ON      o.object_id = fk.parent_object_id
JOIN    sys.schemas s
ON      s.schema_id = o.schema_id
WHERE   s.name = '20090910_foreign'
AND o.name = 't_foreign'
AND EXISTS
(
SELECT  *
FROM    sys.index_columns ic
WHERE   EXISTS
(
SELECT  NULL
FROM    sys.foreign_key_columns fkc
WHERE   fkc.constraint_object_id = fk.object_id
AND fkc.parent_object_id = ic.object_id
AND fkc.parent_column_id = ic.column_id
)
GROUP BY
index_id
HAVING  COUNT(*) = MAX(index_column_id)
AND COUNT(*) =
(
SELECT  COUNT(*)
FROM    sys.foreign_key_columns fkc
WHERE   fkc.constraint_object_id = fk.object_id
)
)

name name name
FK_foreign_col11_col21 20090910_foreign t_foreign
1 row fetched in 0.0002s (0.0015s)

As we can see, the query returned only one foreign key, and this is the one that is indexed properly.

Written by Quassnoi

September 10th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply