EXPLAIN EXTENDED

How to create fast database queries

Building permutations: SQL Server

Comments enabled. I *really* need your comment

From Stack Overflow:

I am doing a testing scenario.

There are 6 people in each site:

Site People
Site 1 A, B, C, D, E, F
Site 2 G, H, I, J, K, L
Site 3 M, N, O, P, Q, R
Site 4 S, T, U, V, W, X

I want to write a query that can suggests me datewise the people who can test the site, two at a time.

Here are the rules:

  1. If a person has tested his site on Day 1, then his turn should come on Day 4 and not before that.
    This, however, is applicable for current week. So if A and D test a site on 22nd, B and E test it on 23rd and C and F test on 24th, then for this week, A and D can test the site only on 25th. Sunday is a holiday.
  2. Every week, the pair should change. Rule 1 is still applicable with the new pair.
  3. A person belonging to a particular site cannot test other sites.

To build these rules, we need to do the following:

  1. For each site, split 6 people into 3 pairs.
  2. Assign each pair to the appropriate day of week (pair 1 for Monday and Thursday, pair 2 for Tuesday and Friday, pair 3 for Wednesday and Saturday)
  3. For each new week, split the 6 people new way so that no pairs persist.

Step 3 is most interesting here, since it requires building permutations.

To build the pairs, we need to arrange 6 people and take the pairs from the arrangements.

There are 6! = 720 distinct ways to arrange 6 people.

But for our task the following pairs:

  • (1, 2) and (2, 1)
  • (3, 4) and (4, 3)
  • (5, 6) and (6, 5)

are considered equal. That's why we can use only 720 / 8 = 90 arrangements which yield non-repeating pairs.

SQL Server does not provide a native way to build all permutations of a set. We will have to emulate it using recursive CTE's.

Let's create a table with required data:

CREATE SCHEMA [20090625_permutation]
CREATE TABLE t_person
(
PersonID INT NOT NULL PRIMARY KEY,
SiteID INT NOT NULL,
PersonName VARCHAR(20) NOT NULL
)
GO
WITH    q AS
(
SELECT  1 AS id
UNION ALL
SELECT  id + 1
FROM    q
WHERE   id < 24
        )
INSERT
INTO    [20090625_permutation].t_person
SELECT  id, (id - 1) / 6 + 1, CHAR(64 + id)
FROM    q
GO

To build a list of permutations, we need to iterate all possible records for the site, skipping the record if had already been chosen on current step.

Since SQL Server does not have a simple way to refer to rows previosly selected in same resultset, we will just store the previously selected values in a space delimited field which will be passed accross recursions:

WITH    sets AS
(
SELECT  SiteID, SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8)AS result, 1 AS rn
FROM    [20090625_permutation].t_person p
UNION ALL
SELECT  p2.SiteID, s.result + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8), rn + 1
FROM    sets s
JOIN    [20090625_permutation].t_person p2
ON      p2.siteid = s.siteid
AND PATINDEX('%' + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8) + '%', s.result) = 0
)
SELECT  *
FROM    sets
WHERE   rn = 6

SiteID result rn
4 24 23 22 21 20 19 6
4 24 23 22 21 19 20 6
4 24 23 22 20 21 19 6
4 24 23 22 20 19 21 6
4 24 23 22 19 21 20 6
4 24 23 22 19 20 21 6
4 24 23 21 22 20 19 6
4 24 23 21 22 19 20 6
4 24 23 21 20 22 19 6
4 24 23 21 20 19 22 6
1 1 2 3 5 4 6 6
1 1 2 3 4 6 5 6
1 1 2 3 4 5 6 6

The permutations are built using recursion, that's why the third pair values are swapped on each step, the second pair values are swapped once for each 3! = 6 steps, and the first pair values are swapped once for each 5! = 120 steps.

To make sure the pairs will always have distinct values, we need to add an additional filter that would filter out the steps on which either pair does not exchange values with any of its neighbors:

WITH    sets AS
(
SELECT  SiteID, SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8)AS result, 1 AS rn
FROM    [20090625_permutation].t_person p
UNION ALL
SELECT  p2.SiteID, s.result + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8), rn + 1
FROM    sets s
JOIN    [20090625_permutation].t_person p2
ON      p2.siteid = s.siteid
AND PATINDEX('%' + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8) + '%', s.result) = 0
),
pairs AS
(
SELECT  SiteId, result, rn2, ROW_NUMBER() OVER (PARTITION BY siteid ORDER BY rn2 % 61) pn
FROM    (
SELECT  s.*,
ROW_NUMBER() OVER (PARTITION BY siteid ORDER BY siteid) AS rn2
FROM    sets s
WHERE   rn = 6
) q
WHERE   rn2 % 2 > 0
AND rn2 % 12 > 5
AND rn2 % 240 > 119
)
SELECT  *
FROM    pairs

SiteId result rn2 pn
1 3 4 1 6 5 2 427 1
1 1 4 2 3 6 5 671 2
1 3 6 4 5 2 1 367 3
1 1 6 4 2 5 3 611 4
1 3 4 1 5 6 2 429 5
1 3 6 4 2 5 1 369 6
1 3 4 1 2 6 5 431 7
1 5 3 1 6 4 2 187 8
1 5 6 3 4 2 1 127 9
1 3 6 4 1 5 2 371 10
4 19 24 22 23 21 20 607 88
4 19 22 20 23 24 21 669 89
4 19 24 22 21 23 20 609 90

Now we have a set of permutations for each week.

To build a calendar, we need to construct a calendar table (again using a recursive CTE) and join it against the permutation resultset so that the corresponding pair is selected for each day of week.

This is easily done using SUBSTRING.

Finally, we add a filter to make sure no Sunday is selected.

Here is the query:

WITH    sets AS
(
SELECT  SiteID, SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8)AS result, 1 AS rn
FROM    [20090625_permutation].t_person p
UNION ALL
SELECT  p2.SiteID, s.result + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8), rn + 1
FROM    sets s
JOIN    [20090625_permutation].t_person p2
ON      p2.siteid = s.siteid
AND PATINDEX('%' + SUBSTRING(CAST(PersonID AS VARCHAR(MAX)) + SPACE(8), 1, 8) + '%', s.result) = 0
),
cal AS
(
SELECT  1 AS rn
UNION ALL
SELECT  rn + 1
FROM    cal
WHERE   rn < 99
        ),
        pairs AS
        (
        SELECT  SiteId, result, ROW_NUMBER() OVER (PARTITION BY siteid ORDER BY rn2 % 61) pn
        FROM    (
                SELECT  s.*,
                        ROW_NUMBER() OVER (PARTITION BY siteid ORDER BY siteid) AS rn2
                FROM    sets s
                WHERE   rn = 6
                ) q
        WHERE   rn2 % 2 > 0
AND rn2 % 12 > 5
AND rn2 % 240 > 119
)
SELECT  CAST('2009-06-25' AS DATETIME) + rn, pairs.siteid,
pf.PersonName AS first,
ps.PersonName AS second
FROM    cal
JOIN    pairs
ON      pn = DATEPART(wk, CAST('2009-06-25' AS DATETIME) + rn)
AND DATEPART(weekday, CAST('2009-06-25' AS DATETIME) + rn) <> 1
JOIN    [20090625_permutation].t_person pf
ON      pf.PersonId = CAST(SUBSTRING(result, 1 + (DATEPART(weekday, CAST('2009-06-25' AS DATETIME) + rn) % 3) * 16, 8) AS INT)
JOIN    [20090625_permutation].t_person ps
ON      ps.PersonId = CAST(SUBSTRING(result, 1 + (DATEPART(weekday, CAST('2009-06-25' AS DATETIME) + rn) % 3) * 16 + 8, 8) AS INT)
siteid first second
2009-06-26 00:00:00.000 1 E B
2009-06-27 00:00:00.000 1 D F
2009-06-29 00:00:00.000 1 C B
2009-06-30 00:00:00.000 1 E F
2009-07-01 00:00:00.000 1 A D
2009-07-02 00:00:00.000 1 C B
2009-07-03 00:00:00.000 1 E F
2009-07-04 00:00:00.000 1 A D
2009-07-06 00:00:00.000 1 E D
2009-07-07 00:00:00.000 1 C F
2009-07-08 00:00:00.000 1 A B
2009-07-09 00:00:00.000 1 E D
2009-09-30 00:00:00.000 4 V X
2009-10-01 00:00:00.000 4 T S
2009-10-02 00:00:00.000 4 U W

This is the resultset we were after.

For the resultset to be more readable, I deliberately made it not pivoted so that the different sites are selected in rows, not columns.

Written by Quassnoi

June 25th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply