August 4, 2009 - EXPLAIN EXTENDED at EXPLAIN EXTENDED

Archive for August 4th, 2009

SQL Server: deleting adjacent values (improved)

Comments enabled. I *really* need your comment

Yesterday, I wrote an article on how to delete adjacent values from a table in SQL Server:

SQL Server: deleting adjacent values

A quick reminder of the problem, taken from Stack Overflow:

I am trying to selectively delete records from an SQL Server 2005 table without looping through a cursor.

The table can contain many records (sometimes more than 500,000) so looping is too slow.

id unitId day interval amount

1 100 10 21 9.345

2 100 10 22 9.367

3 200 11 21 4.150

4 300 11 21 4.350

5 300 11 22 4.734

6 300 11 23 5.106

7 400 13 21 10.257

8 400 13 22 10.428

Values of (UnitID, Day, Interval) are unique.

In this example I wish to delete records 2, 5 and 8, since they are adjacent to an existing record (based on the key).

Note: record 6 would not be deleted because once 5 is gone it is not adjacent any longer.

id	unitId	day	interval	amount
1	100	10	21	9.345
2	100	10	22	9.367
3	200	11	21	4.150
4	300	11	21	4.350
5	300	11	22	4.734
6	300	11	23	5.106
7	400	13	21	10.257
8	400	13	22	10.428

To delete adjacent values, we should select continous ranges of intervals and delete each even row from each continous range.

In the previous article, I used a subquery to do this:

WITH   rows AS
        (
        SELECT  *,
                ROW_NUMBER() OVER
                (
                PARTITION BY
                        (
                        SELECT  TOP 1 qi.id AS mint
                        FROM    [20090803_adjacent].records qi
                        WHERE   qi.unitid = qo.unitid
                                AND qi.day = qo.day
                                AND qi.interval < = qo.interval
                                AND NOT EXISTS
                                (
                                SELECT  NULL
                                FROM    [20090803_adjacent].records t
                                WHERE   t.unitid = qi.unitid
                                        AND t.day = qi.day
                                        AND t.interval = qi.interval - 1
                                )
                        ORDER BY
                                qi.interval DESC
                        )
                ORDER BY interval
                ) AS rnm
        FROM    [20090803_adjacent].records qo
        )
DELETE
FROM    rows
WHERE   rnm % 2 = 0

This subquery traverses adjacent intervals until it find the first interval that has no predecessor.

Its id then serves as a range identifier in a PARTITION BY clause.

However, this query can be improved.

Assume we have the following intervals:

interval
3
4
5
6
10
11
12
20
21

We have three ranges here: 3:6, 10:12 and 20:21.

To identify a continuous range of intervals, we first should calculate a per-partition ROW_NUMBER of these intervals:

interval	ROW_NUMBER
3	1
4	2
5	3
6	4
10	5
11	6
12	7
20	8
21	9

ROW_NUMBERs, by definition, don't have gaps within the partition, while the intervals can have gaps. However, inside each range ROW_NUMBERs increase with intervals.

This means that we can just subtract ROW_NUMBER from the interval:

interval	ROW_NUMBER	ROW_NUMBER - interval
3	1	2
4	2	2
5	3	2
6	4	2
10	5	5
11	6	5
12	7	5
20	8	12
21	9	12

Within each range, this difference is a constant. When a gap occurs, this difference changes.

Therefore, the value of this difference identifies a range within each partition, and can be used for partitioning ranges just as well.

This should be more efficient since it does not use the subqueries.

Let's create a sample table and check:

Read the rest of this entry »

Written by Quassnoi

August 4th, 2009 at 11:00 pm

Posted in SQL Server

EXPLAIN EXTENDED

Archive for August 4th, 2009

SQL Server: deleting adjacent values (improved)

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

Archive for August 4th, 2009

SQL Server: deleting adjacent values (improved)

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow