Archive for May, 2013
SQL Server: joining NULL values
From Stack Overflow:
UPDATE n SET AddressID = a.AddressID FROM #NewAddress n JOIN dbo.Address a ON (a.[LineA] = n.[LineA] OR (a.[LineA] is null AND n.[LineA] is null)) AND (a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null)) AND (a.[LineC] = n.[LineC] OR (a.[LineC] is null AND n.[LineC] is null)) WHERE n.Processed = 0Any ideas on how to UNION or EXISTS this query? This seems to be a pretty common join condition I'm encountering.
Here's a JOIN
condition: we should join two tables on a common key, also treating NULL values as equal. Normal equality conditions don't work here: in SQL, NULL
values are not considered equal (strictly speaking, their equality is a boolean NULL
rather than true or false).
ANSI standard defines a special predicate for that: IS NOT DISTINCT FROM
, which is supported by PostgreSQL. This treats values the same value as DISTINCT
does: two equal values, as well as two NULL
values, are considered not distinct. MySQL also supports similar operator, <=>
. Unfortunately, SQL Server supports neither.
In SQL Server, we can emulate this by using set operators which distinctify the queries. Set operators work on sets, so we would need some kind of a SELECT
query for them. This would also help us to overcome SQL Server's inability to compare tuples directly. Basically, we need to tell if two tuples: (a.lineA, a.lineB, a.lineC)
and (n.lineA, n.lineB, n.lineC)
are distinct or not.
We can do that by selecting those tuples in dummy queries (those without a FROM
clause), intersecting those queries (which would give us a one-record or zero-record resultset, depending on whether the two tuples are distinct or not) and applying EXISTS
to the resultset. EXISTS
is a boolean predicate and as such can serve as a join condition.
Let's check it. We'll create two tables, fill them with random values (including occasional NULLs here and there):