Searching for NULL columns
Comments enabled. I *really* need your comment
From Stack Overflow:
I want to make sure I'm not inserting a duplicate row into my table (i. e. only
PRIMARY KEY
different).All my fields allow
NULL
's as I've decidedNULL
to meanall values.Because of
NULL
's, the following statement in my stored procedure can't work:IF EXISTS ( SELECT * FROM MY_TABLE WHERE MY_FIELD1 = @IN_MY_FIELD1 AND MY_FIELD2 = @IN_MY_FIELD2 AND MY_FIELD3 = @IN_MY_FIELD3 AND MY_FIELD4 = @IN_MY_FIELD4 AND MY_FIELD5 = @IN_MY_FIELD5 AND MY_FIELD6 = @IN_MY_FIELD6 ) BEGIN goto on_duplicate END, since
NULL = NULL
is not true.How can I check for the duplicates without having an
IF ISNULL
statement for every column?
First of all, can't help being a smartass and tell that a UNIQUE
index should be defined over these columns.
SQL Server does index NULL
values and they are first class values as long as uniqueness is concerned, so creating this index will manage the duplicates automatically.
Now, back to business. Of course there are cases when creating such an index is not acceptable.
In these cases a very nice but little known operator INTERSECT
comes handy.
This operator takes two resultsets with the same layout and finds rows common to them (eliminatnig duplicates), as SQL Server's understands common: each value in each column should be either equal to one in the other row or they both should be NULL
's.
Let's create a sample table:
CREATE SCHEMA [20090704_intersect] CREATE TABLE t_fieldset ( id INT NOT NULL PRIMARY KEY, field1 INT, field2 INT, field3 INT, value VARCHAR(100) NOT NULL ) GO BEGIN TRANSACTION SELECT RAND(20090704) DECLARE @cnt INT SET @cnt = 1 WHILE @cnt <= 1000000 BEGIN INSERT INTO [20090704_intersect].t_fieldset VALUES ( @cnt, NULLIF(FLOOR(RAND() * 100), 0), NULLIF(FLOOR(RAND() * 100), 0), NULLIF(FLOOR(RAND() * 100), 0), LEFT('Value ' + CAST(@cnt AS VARCHAR) + REPLICATE('-', 100), 100) ) SET @cnt = @cnt + 1 END COMMIT GO [/sourcecode] A <code>1,000,000</code> rows and three unindexed fields here, filled with random values. First, let's issue an ugly query using <code>IFNULL</code>'s: DECLARE @field1 INT DECLARE @field2 INT DECLARE @field3 INT SET @field1 = 42 SET @field2 = NULL SET @field3 = NULL SELECT * FROM [20090704_intersect].t_fieldset WHERE (field1 = @field1 OR (field1 IS NULL AND @field1 IS NULL)) AND (field2 = @field2 OR (field2 IS NULL AND @field2 IS NULL)) AND (field3 = @field3 OR (field3 IS NULL AND @field3 IS NULL))
id | field1 | field2 | field3 | value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13961 | 42 | Value 13961----------------------------------------------------------------------------------------- | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0003s (0.7318s) |
Table 't_fieldset'. Scan count 3, logical reads 19673, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 656 ms, elapsed time = 728 ms.
It runs for more than 0.7 seconds with the following plan:
SELECT Clustered Index Scan ([20090704_intersect].[t_fieldset])
Now, let's use INTERSECT
.
It requires a query, so we'll wrap the parameters into a SELECT
statement:
DECLARE @field1 INT DECLARE @field2 INT DECLARE @field3 INT SET @field1 = 42 SET @field2 = NULL SET @field3 = NULL SELECT * FROM [20090704_intersect].t_fieldset WHERE EXISTS ( SELECT field1, field2, field3 INTERSECT SELECT @field1, @field2, @field3 )
id | field1 | field2 | field3 | value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13961 | 42 | Value 13961----------------------------------------------------------------------------------------- | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0003s (0.3195s) |
Table 't_fieldset'. Scan count 3, logical reads 19673, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 635 ms, elapsed time = 314 ms.
Same result, but the execution plan is slightly different:
SELECT Parallelism (Gather Streams) Clustered Index Scan ([20090704_intersect].[t_fieldset])
We see that INTERSECT
, unlike IFNULL
's, enables parallel execution, and makes this query twice as efficient on my dual-core machine.