Constraints and the optimizer in SQL Server: FOREIGN KEY at EXPLAIN EXTENDED

Constraints and the optimizer in SQL Server: FOREIGN KEY

Comments enabled. I *really* need your comment

Continuing on from the yesterday's article.

I was wondering: do the constraints declared on tables in SQL Server affect the decisions made by the optimizer?

SQL Server allows to define the following constraints on the columns:

PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
DEFAULT

Today we will see how FOREIGN KEY affect the plans.

A FOREIGN KEY guarantees that every value of the column constrained with it is contained in a PRIMARY KEY or UNIQUE column of another table.

This fact can be used to simpify joins on these columns and using IN clause.

Let's create the sample tables:

Table creation details

CREATE SCHEMA [20091015_constraint]
CREATE TABLE t_primary (
id INT NOT NULL,
name VARCHAR(30) NOT NULL,
CONSTRAINT PK_primary_id PRIMARY KEY (id)
)
CREATE TABLE t_foreign (
id INT NOT NULL,
pid INT NOT NULL,
uval INT NOT NULL,
dval INT NOT NULL,
name VARCHAR(30) NOT NULL,
CONSTRAINT PK_foreign_id PRIMARY KEY (id),
CONSTRAINT FK_foreign_pid_primary FOREIGN KEY (pid) REFERENCES t_primary,
CONSTRAINT UQ_foreign_uval UNIQUE (uval),
CONSTRAINT CK_foreign_uval_dval CHECK (uval = dval)
)
GO
CREATE INDEX IX_foreign_dval ON [20091015_constraint].t_foreign (dval)
GO
BEGIN TRANSACTION
SELECT  RAND(20091015)
DECLARE @cnt INT
DECLARE @ld INT
DECLARE @r INT
SET @cnt = 1
WHILE @cnt &lt;= 100000
BEGIN
        INSERT
        INTO    [20091015_constraint].t_primary
        VALUES  (@cnt, 'Primary ' + CAST(@cnt AS VARCHAR))
        SET @ld = (@cnt - 1) * 10 + 1
        WHILE @ld &lt;= @cnt * 10
        BEGIN
                SET @r = @ld * 100 + FLOOR(RAND() * 100)
                INSERT
                INTO    [20091015_constraint].t_foreign
                VALUES  (
                        @ld,
                        @cnt,
                        @r,
                        @r,
                        'Foreign ' + CAST(@ld AS VARCHAR)
                        )
                SET @ld = @ld + 1
        END
        SET @cnt = @cnt + 1
END
COMMIT
GO

t_primary has 100,000 records, t_foreign has 1,000,000 records.

There are a PRIMARY KEY, a UNIQUE, a FOREIGN KEY and a CHECK defined on t_foreign.

JOIN

A FOREIGN KEY on t_foreign.pid guarantees two things:

Each value of t_foreign.pid will be found in t_primary (otherwise a constraint would fail)
Each value of t_foreign.pid will be found in t_primary exactly once, since it's impossible to reference a column that is not UNIQUE or a PRIMARY KEY

This means that when joining t_foreign with t_primary, no rows from t_foreign can be filtered out or duplicated by the join. Each row from t_foreign will contribute into the resultset as there were no join.

This, on its turn, means that if we use no rows from t_primary, the JOIN can be omitted at all because it's redundant.

Let's try it:

SELECT  TOP 10 f.*
FROM    [20091015_constraint].t_primary p
JOIN    [20091015_constraint].t_foreign f
ON      p.id = f.pid

View query details

id	pid	uval	dval	name
1	1	134	134	Foreign 1
2	1	221	221	Foreign 2
3	1	345	345	Foreign 3
4	1	449	449	Foreign 4
5	1	505	505	Foreign 5
6	1	677	677	Foreign 6
7	1	727	727	Foreign 7
8	1	857	857	Foreign 8
9	1	962	962	Foreign 9
10	1	1073	1073	Foreign 10
10 rows fetched in 0.0006s (0.0007s)

Table 't_foreign'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

  |--Top(TOP EXPRESSION:((10)))
       |--Clustered Index Scan(OBJECT:([test].[20091015_constraint].[t_foreign].[PK_foreign_id] AS [f]))

As we can see, the plan built for the query does not use t_primary at all. Since we select no columns from it and the join does not affect the resultset in any way, it is safe to exclude it from consideration at all.

IN

The same is true for an IN predicate.

As I wrote in one of my earlier articles, IN vs. JOIN vs. EXISTS, IN predicate is equivalent to JOIN with (SELECT DISTINCT).

Since t_primary.id is a PRIMARY KEY, SQL Server will eliminate the redundant distinct (as was shown in the yesterday's article) and IN constuct will be optimized to a plain JOIN whose plan we had just examined.

Let's check it:

SELECT  TOP 10 f.*
FROM    [20091015_constraint].t_foreign f
WHERE   f.pid IN
(
SELECT  id
FROM    [20091015_constraint].t_primary p
)

View query details

id	pid	uval	dval	name
1	1	134	134	Foreign 1
2	1	221	221	Foreign 2
3	1	345	345	Foreign 3
4	1	449	449	Foreign 4
5	1	505	505	Foreign 5
6	1	677	677	Foreign 6
7	1	727	727	Foreign 7
8	1	857	857	Foreign 8
9	1	962	962	Foreign 9
10	1	1073	1073	Foreign 10
10 rows fetched in 0.0006s (0.0105s)

Table 't_foreign'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

  |--Top(TOP EXPRESSION:((10)))
       |--Clustered Index Scan(OBJECT:([test].[20091015_constraint].[t_foreign].[PK_foreign_id] AS [f]))

For the reasons I just described this query uses exactly same plan and doesn't even look into t_primary.

Trusted constraints

The plan optimizations I described above can, thougn, be only made if the constraint can be relied upon.

In SQL Server, however, the constraint can be temporarily switched off in which case it does not check the new and updated records anymore. This can be useful for mass updates to data, like bulk loads: online constraint checking is quite an expensive task and it's better done in batch mode: first, all data are imported, second, all data are checked etc.

It then can be reenable to check new records, but unless a special command was issued, no existing records (changed during the time the constraint was off) are checked for consistency.

In this case of course it is impossible to do optimizations like this and SQL Server will have to revert to less efficient methods.

To track the constraint state SQL Server has a special metadata attribute called is_not_trusted. If a constraint is trusted, that is it had never been disabled since the last data check, the optimizations described above will apply. Otherwise the queries will be optimized as if there were no constraints at all.

Let's try it:

ALTER TABLE [20091015_constraint].t_foreign NOCHECK CONSTRAINT FK_foreign_pid_primary

This command temporary disables the FOREIGN KEY constraint (leaving its definition in the database).

ALTER TABLE [20091015_constraint].t_foreign CHECK CONSTRAINT FK_foreign_pid_primary

This command reenables it back. The constraint, however, does not keep the update history and theoretically there could be inconsistent values inserted while the constraint was off.

This makes the constraint untrusted:

SELECT  SCHEMA_NAME(schema_id) AS schema_name,
OBJECT_NAME(parent_object_id) AS table_name,
name
FROM    sys.foreign_keys
WHERE   SCHEMA_NAME(schema_id) = '20091015_constraint'
AND is_not_trusted = 1

schema_name	table_name	name
20091015_constraint	t_foreign	FK_foreign_pid_primary

Now, the JOIN query produces the following plan:

SELECT  TOP 10 f.*
FROM    [20091015_constraint].t_primary p
JOIN    [20091015_constraint].t_foreign f
ON      p.id = f.pid

  |--Top(TOP EXPRESSION:((10)))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([f].[pid], [Expr1004]) WITH UNORDERED PREFETCH)
            |--Clustered Index Scan(OBJECT:([test].[20091015_constraint].[t_foreign].[PK_foreign_id] AS [f]))
            |--Clustered Index Seek(OBJECT:([test].[20091015_constraint].[t_primary].[PK_primary_id] AS [p]), SEEK:([p].[id]=[test].[20091015_constraint].[t_foreign].[pid] as [f].[pid]) ORDERED FORWARD)

which needs to lookup the values of t_primary just as well.

To make the constraint trusted again we should recheck all values in the table. This is done using WITH CHECK option:

ALTER TABLE [20091015_constraint].t_foreign WITH CHECK CHECK CONSTRAINT FK_foreign_pid_primary

CHECK CONSTRAINT here means starting from now, check new values, WITH CHECK means before enabling, check the existing values.

Summary

When joining a table with a FOREIGN KEY to a table it references, the join is guaranteed not to duplicate or filter out the records from the first row. If no values from the referenced table are used, a JOIN can be optimized so that it does not look into the joined table at all.

An IN predicate on such a query is optimized to a DISTINCT and therefore is subject to the same optimization technique.

For this to work, the constraint should be enabled and trusted. Failure to satisfy either of these conditions disables the optimization. To reenable the constraint and recover the trust, one uses CHECK CONSTRAINT and WITH CHECK in the ALTER TABLE statement, accordingly.

Written by Quassnoi

October 15th, 2009 at 11:00 pm

Posted in SQL Server

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

EXPLAIN EXTENDED