EXPLAIN EXTENDED

How to create fast database queries

Archive for December 30th, 2009

Grouping continuous ranges

with one comment

From Stack Overflow:

I have a ticketing system.

Now I have to select adjacent places when the user asks for 2 or 3 tickets.

Every ticket has a line and column number. The concept of adjacent places is places in the same line with adjacent columns numbers.

These tickets are in an SQL Server database.

Any ideas about this algorithm to search for available adjacent seats?

This is a problem known as grouping continuous ranges: finding the continuous ranges of the records (in a certain order) having the same value of the grouping column.

For the table in question, the groups will look like this:

Seats
Row Column Occupied Group
1 1 1 -
1 2 0 1
1 3 0 1
1 4 0 1
1 5 1 -
1 6 1 -
1 7 0 2
1 8 0 2
1 9 1 -
1 10 0 3

To find the spans of free seats with the required length, we need to group the continuous ranges of rows with occupied = 0 in column order (within the rows), calculate the counts of records in these groups and return the groups having sufficient value of COUNT(*).

To group something in SQL, we need to have an expression that would have the same value for all records belonging to the group.

How do we build such an expression for the continuous ranges?

In a movie theater, seats are numbered sequentially, so we can rely on the values of the column being continuous. Now, let's calculate only the records that describe the free seats, and build their row numbers in the column order:

Seats
Row Column Occupied ROW_NUMBER
1 2 0 1
1 3 0 2
1 4 0 3
1 7 0 4
1 8 0 5
1 10 0 6

We see that filtering only free seats broke the column numbering order (the columns numbers are not continuous anymore), but the ROW_NUMBERs are continuous (by definition). Each occupied seat breaks the continuity of the columns, and the spans of the free seats correspond to the ranges of columns with unbroken continuity.

If the continuity is not broken, column and ROW_NUMBER both increase by 1 with each record. This means that their difference will be constant:

Seats
Row Column Occupied ROW_NUMBER column - ROW_NUMBER
1 2 0 1 1
1 3 0 2 1
1 4 0 3 1
1 7 0 4 3
1 8 0 5 3
1 10 0 6 4

The value of difference between column and ROW_NUMBER, therefore, uniquely defines the continuous group the records belong to.

This means we can group on it and do other things (the rest being pure technical).

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

December 30th, 2009 at 11:00 pm

Posted in SQL Server