Archive for February 10th, 2010
Answering questions asked on the site.
I have two really large tables with lots of columns, many of them are nullable
I need to remove all rows from
table1which are not present in
table2, but there can be duplicates in both tables and writing more than 70
IS NULLconditions would be a pain, and I want to make sure there’s nothing I’m missing.
Is there a more simple way?
SQL Server supports
EXCEPT clause which returns all records present in the first table and absent in the second one. But this clause eliminates duplicates and cannot be used as a subject to a
ANSI SQL standard describes
EXCEPT ALL which returns all records from the first table which are not present in the second table, leaving the duplicates as is. Unfortunately, SQL Server does not support this operator.
Similar behavior can be achieved using
NOT IN or
NOT EXISTS constructs. But in SQL Server,
IN predicate does not accept more than one field. NOT EXISTS accepts any number of correlated columns, but it requires extra checks in the
WHERE clause, since equality operator does not treat two NULL values as equal. Each pair or nullable columns should be additionally checked for a NULL in both fields. This can only be done using
OR predicates or
COALESCE, neither of which adds to performance.
But there is a way to emulate
EXCEPT ALL in SQL Server quite elegantly and efficiently.
Let’s create two sample tables:
Read the rest of this entry »