Archive for July 13th, 2009
SQL Server: aggregate bitwise OR
From Stack Overflow:
I am creating a script for merging and deleting duplicate rows from a table.
The table contains address information, and uses an integer field for storing information about the email as bit flags (column name
value
). For example, if bit 1 is set invalue
, that means the record is a primary address.There are instances of the same email being entered twice, but sometimes with different
value
s. To resolve this, I need to take thevalue
from all duplicates, assign them to one surviving record and delete the rest.My biggest headache so far as been with the merging of the records. What I want to do is bitwise
OR
allvalue
s of duplicate records together.
From database theory's point of view, this design of course violates the 1NF, since multiple properties are contained in one column (in bit-packed form). It would be easier to split them apart and create a separate column for each bit.
However, it can be a legitimate design if the fields are not parsed on the database side, but instead passed as-is to a client which needs them in this bit-packed form. And anyway, helping is better than criticizing.
We have three problems here:
- Select a first record for each set of duplicates
- Update this record with bitwise
OR
of all values in its set - Delete all other records
Step 1 is easy to do using ROW_NUMBER()
.
Step 3 is also not very hard. Microsoft has a knowledge base article KB139444 that described a really weird way to remove the duplicates, but it may be done much more easily using same ROW_NUMBER() with a CTE or an inline view.
See this article I wrote some time ago on how to do this:
Now, the main problem is step 2.
SQL Server lacks a native way to calculate bitwise aggregates, but with a little effort it can be emulated.
The main idea here is that for bit values, aggregate OR
and AND
can be replaced with MAX
and MIN
, accordingly.
All we need is to split each value into the bits, aggregate each bit and merge the results together.
Let's create a sample table:
Read the rest of this entry »