EXPLAIN EXTENDED

How to create fast database queries

Archive for July 13th, 2009

SQL Server: aggregate bitwise OR

with 2 comments

From Stack Overflow:

I am creating a script for merging and deleting duplicate rows from a table.

The table contains address information, and uses an integer field for storing information about the email as bit flags (column name value). For example, if bit 1 is set in value, that means the record is a primary address.

There are instances of the same email being entered twice, but sometimes with different values. To resolve this, I need to take the value from all duplicates, assign them to one surviving record and delete the rest.

My biggest headache so far as been with the merging of the records. What I want to do is bitwise OR all values of duplicate records together.

From database theory's point of view, this design of course violates the 1NF, since multiple properties are contained in one column (in bit-packed form). It would be easier to split them apart and create a separate column for each bit.

However, it can be a legitimate design if the fields are not parsed on the database side, but instead passed as-is to a client which needs them in this bit-packed form. And anyway, helping is better than criticizing.

We have three problems here:

  1. Select a first record for each set of duplicates
  2. Update this record with bitwise OR of all values in its set
  3. Delete all other records

Step 1 is easy to do using ROW_NUMBER().

Step 3 is also not very hard. Microsoft has a knowledge base article KB139444 that described a really weird way to remove the duplicates, but it may be done much more easily using same ROW_NUMBER() with a CTE or an inline view.

See this article I wrote some time ago on how to do this:

Now, the main problem is step 2.

SQL Server lacks a native way to calculate bitwise aggregates, but with a little effort it can be emulated.

The main idea here is that for bit values, aggregate OR and AND can be replaced with MAX and MIN, accordingly.

All we need is to split each value into the bits, aggregate each bit and merge the results together.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

July 13th, 2009 at 11:00 pm

Posted in SQL Server