Constraints and the optimizer in SQL Server: FOREIGN KEY
Comments enabled. I *really* need your comment
Continuing on from the yesterday's article.
I was wondering: do the constraints declared on tables in SQL Server affect the decisions made by the optimizer?
SQL Server allows to define the following constraints on the columns:
PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
DEFAULT
Today we will see how FOREIGN KEY
affect the plans.
A FOREIGN KEY guarantees that every value of the column constrained with it is contained in a PRIMARY KEY or UNIQUE
column of another table.
This fact can be used to simpify joins on these columns and using IN
clause.
Let's create the sample tables:
Table creation details
t_primary
has 100,000 records, t_foreign
has 1,000,000 records.
There are a PRIMARY KEY
, a UNIQUE
, a FOREIGN KEY
and a CHECK
defined on t_foreign
.
JOIN
A FOREIGN KEY
on t_foreign.pid
guarantees two things:
- Each value of t_foreign.pid will be found in
t_primary
(otherwise a constraint would fail) - Each value of t_foreign.pid will be found in
t_primary
exactly once, since it's impossible to reference a column that is notUNIQUE
or aPRIMARY KEY
This means that when joining t_foreign
with t_primary
, no rows from t_foreign
can be filtered out or duplicated by the join. Each row from t_foreign
will contribute into the resultset as there were no join.
This, on its turn, means that if we use no rows from t_primary
, the JOIN
can be omitted at all because it's redundant.
Let's try it:
SELECT TOP 10 f.* FROM [20091015_constraint].t_primary p JOIN [20091015_constraint].t_foreign f ON p.id = f.pid
As we can see, the plan built for the query does not use t_primary
at all. Since we select no columns from it and the join does not affect the resultset in any way, it is safe to exclude it from consideration at all.
IN
The same is true for an IN
predicate.
As I wrote in one of my earlier articles, IN vs. JOIN vs. EXISTS
, IN
predicate is equivalent to JOIN
with (SELECT DISTINCT)
.
Since t_primary.id
is a PRIMARY KEY
, SQL Server will eliminate the redundant distinct (as was shown in the yesterday's article) and IN
constuct will be optimized to a plain JOIN
whose plan we had just examined.
Let's check it:
SELECT TOP 10 f.* FROM [20091015_constraint].t_foreign f WHERE f.pid IN ( SELECT id FROM [20091015_constraint].t_primary p )
For the reasons I just described this query uses exactly same plan and doesn't even look into t_primary
.
Trusted constraints
The plan optimizations I described above can, thougn, be only made if the constraint can be relied upon.
In SQL Server, however, the constraint can be temporarily switched off in which case it does not check the new and updated records anymore. This can be useful for mass updates to data, like bulk loads: online constraint checking is quite an expensive task and it's better done in batch mode: first, all data are imported, second, all data are checked etc.
It then can be reenable to check new records, but unless a special command was issued, no existing records (changed during the time the constraint was off) are checked for consistency.
In this case of course it is impossible to do optimizations like this and SQL Server will have to revert to less efficient methods.
To track the constraint state SQL Server has a special metadata attribute called is_not_trusted
. If a constraint is trusted, that is it had never been disabled since the last data check, the optimizations described above will apply. Otherwise the queries will be optimized as if there were no constraints at all.
Let's try it:
ALTER TABLE [20091015_constraint].t_foreign NOCHECK CONSTRAINT FK_foreign_pid_primary
This command temporary disables the FOREIGN KEY
constraint (leaving its definition in the database).
ALTER TABLE [20091015_constraint].t_foreign CHECK CONSTRAINT FK_foreign_pid_primary
This command reenables it back. The constraint, however, does not keep the update history and theoretically there could be inconsistent values inserted while the constraint was off.
This makes the constraint untrusted:
SELECT SCHEMA_NAME(schema_id) AS schema_name, OBJECT_NAME(parent_object_id) AS table_name, name FROM sys.foreign_keys WHERE SCHEMA_NAME(schema_id) = '20091015_constraint' AND is_not_trusted = 1
schema_name | table_name | name |
---|---|---|
20091015_constraint | t_foreign | FK_foreign_pid_primary |
Now, the JOIN
query produces the following plan:
SELECT TOP 10 f.* FROM [20091015_constraint].t_primary p JOIN [20091015_constraint].t_foreign f ON p.id = f.pid
|--Top(TOP EXPRESSION:((10))) |--Nested Loops(Inner Join, OUTER REFERENCES:([f].[pid], [Expr1004]) WITH UNORDERED PREFETCH) |--Clustered Index Scan(OBJECT:([test].[20091015_constraint].[t_foreign].[PK_foreign_id] AS [f])) |--Clustered Index Seek(OBJECT:([test].[20091015_constraint].[t_primary].[PK_primary_id] AS [p]), SEEK:([p].[id]=[test].[20091015_constraint].[t_foreign].[pid] as [f].[pid]) ORDERED FORWARD)
which needs to lookup the values of t_primary
just as well.
To make the constraint trusted again we should recheck all values in the table. This is done using WITH CHECK
option:
ALTER TABLE [20091015_constraint].t_foreign WITH CHECK CHECK CONSTRAINT FK_foreign_pid_primary
CHECK CONSTRAINT
here means starting from now, check new values
, WITH CHECK
means before enabling, check the existing values
.
Summary
When joining a table with a FOREIGN KEY
to a table it references, the join is guaranteed not to duplicate or filter out the records from the first row. If no values from the referenced table are used, a JOIN
can be optimized so that it does not look into the joined table at all.
An IN
predicate on such a query is optimized to a DISTINCT
and therefore is subject to the same optimization technique.
For this to work, the constraint should be enabled and trusted. Failure to satisfy either of these conditions disables the optimization. To reenable the constraint and recover the trust, one uses CHECK CONSTRAINT
and WITH CHECK
in the ALTER TABLE
statement, accordingly.