Archive for September 15th, 2009
This series of articles is inspired by multiple questions asked by the site visitors and Stack Overflow users, including Tony, Philip, Rexem and others.
Which method (NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL) is best to select values present in one table but missing in another one?
This:SELECT l.* FROM t_left l LEFT JOIN t_right r ON r.value = l.value WHERE r.value IS NULL
, this:SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right r )
or this:SELECT l.* FROM t_left l WHERE NOT EXISTS ( SELECT NULL FROM t_right r WHERE r.value = l.value )
Differences between the methods
These methods are quite different.
First of all,
LEFT JOIN / IS NULL and
NOT EXISTS are semantically equivalent, while
NOT IN is not. These method differ in how they handle
NULL values in
LEFT JOIN is guaranteed to return every row from
t_left, and then filtering is applied to the values returned from
t_right. If for some row in
t_left there is no corresponding row in
t_right (which means no row with that exact value is present in
t_right), the row from
t_left will be returned once, and the
NULL values will be substituted instead of
t_right's actual values.
NULL values can never satisfy an equality
JOIN condition, the
NULL values returned by the query are guaranteed to be substituted by the
LEFT JOIN, not fetched out of the actual
t_right's row. This means that
LEFT JOIN / IS NULL is guaranteed to return at most one row from
t_left, and these row's
value is not equal to one of those in
The same holds for
NOT EXISTS. Since it's a predicate, not a
JOIN condition, the rows from
t_left can only be returned at most once too.
EXISTS always returns
FALSE and it will return
TRUE as soon as it finds only a single matching row in
FALSE, if it find none.
NOT EXISTS, therefore, will return
TRUE only if no row satisfying the equality condition is found in
t_right (same as for
LEFT JOIN / IS NULL).
NULL values do not safisfy the equality conditions, so both
LEFT JOIN / IS NULL and
NOT EXISTS will always return rows from
t_left that have
value set to
NULL, even is there are rows with
value IS NULL in
NOT IN, however, behaves differently.
IN predicate (unlike
EXISTS) is trivalent, i. e. it can return
TRUEis returned when the non-
NULLvalue in question is found in the list
FALSEis returned when the non-
NULLvalue is not found in the list and the list does not contain
NULLis returned when the value is
NULL, or the non-
NULLvalue is not found in the list and the list contains at least one
IN predicate does not give a definitive answer to whether or not the expression is contained in the list as long as there are
NULL values on either side of the expression, returning
This of course makes no difference when using the positive form of
NULL: predicates returning
NULL are filtered out by the
WHERE clause as well as those returning
NOT IN is different, since negation of
NULL as well.
NOT IN condition will never hold for any list with a
NULL value in it.
- If a row is found in the list,
NOT IN, therefore, will return
- If a row is not found in the list,
NOT INon its turn will also return
Both conditions will of course be filtered out by the
Let's illustrate it with two simple queries that compare
(1, NULL) in t_left with
(2, NULL) in
WITH t_left AS ( SELECT 1 AS value UNION ALL SELECT NULL ), t_right AS ( SELECT 2 AS value UNION ALL SELECT NULL ) SELECT l.* FROM t_left l WHERE NOT EXISTS ( SELECT NULL FROM t_right r WHERE r.value = l.value )
|2 rows fetched in 0.0001s (0.0006s)|
This query, using
NOT EXISTS, returns both values from
t_left, since neither of them is equal to any of the values from
WITH t_left AS ( SELECT 1 AS value UNION ALL SELECT NULL ), t_right AS ( SELECT 2 AS value UNION ALL SELECT NULL ) SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right )
|0 rows fetched in 0.0001s (0.0005s)|
This query, on the other hand, returns nothing. Since there is a
NOT IN returns
NULL rather than
TRUE if the value is not found among the defined values. Just in case.
NOT IN) are too chicken to say something definite about lists with
NULL unless they are completely sure that the value is there.
However, if the values in both tables are non-nullable,
NULL, all three method describe above are semantically identical.