EXPLAIN EXTENDED

How to create fast database queries

Dynamic pivot

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table and want to transpose its rows to columns, similar to a pivot table but without summarising.

For example I have the following tables:

Question (QuestionID, QuestionText)
Response (ResponseID, ResponseText, QuestionID)

Basically I want to be able to create a dynamic table something like:

Question 1 Question 2 Question 3
Response 1.1 Response 1.2 Response 1.3
Response 2.1 Response 2.2 Response 2.3
Response 3.1 Response 3.2 Response 3.3
Response 4.1 Response 4.2 Response 4.3

The main requirement would be I don't know at design time what the question text will be.

First, let's put it straight: SQL is not a right tool to do this.

SQL operates on sets. An SQL query is a functions that returns a set. And the column layout is an implicit property of a set. It should be defined in design time. This is the way SQL works.

However, there are several approaches that can simplify the task. One of them is using SQL Server's XML abilities.

To arrange the answers in a table, we need to make the following:

  1. Select all questions. This rowset will define the columns of our table.
  2. Select the row numbers for all answers for the question that has the most answers. This rowset will define the rows of our table. The number of rows wil be exactly as the number of answers to the most answered question.
  3. Assign the ordinal number to each answer on any question. This is done using ROW_NUMBER(). This value will uniquely define the position of the answer in a table.
  4. For each row (taken from the rowset made on step 2), take all the questions and LEFT JOIN the answers that need to be put into this row, using the ordinal number calculated on the step 3 as a LEFT JOIN condition. This will put either the answer or an empty string into the corresponding cell.
  5. Transform the rowset retrieved on step above into an XML string made out of <td>'s and enclosed into a <tr>
  6. Transform the resulting rowset into an XML made out of the XML's from the previous step and enclosed into a <table>

Let's create a sample table and see how can we do this:

CREATE SCHEMA [20090727_pivot]
CREATE TABLE question
        (
        id INT NOT NULL PRIMARY KEY,
        value VARCHAR(20) NOT NULL
        )
CREATE TABLE answer
        (
        id INT NOT NULL PRIMARY KEY,
        question_id INT NOT NULL,
        value VARCHAR(20) NOT NULL
        )
GO
CREATE INDEX ix_answer_question ON [20090727_pivot].answer (question_id)
GO
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 30
BEGIN
        INSERT
        INTO    [20090727_pivot].answer (id, question_id, value)
        SELECT  @cnt, @cnt % 5 + 1, 'Answer ' + CAST(@cnt AS VARCHAR)
        WHERE   @cnt % 7 <> 0
                AND @cnt % 9 <> 0
        SET @cnt = @cnt + 1
END
INSERT
INTO    [20090727_pivot].question (id, value)
SELECT  question_id, 'Question ' + CAST(question_id AS VARCHAR)
FROM    (
        SELECT  DISTINCT question_id
        FROM    [20090727_pivot].answer
        ) a
COMMIT

Table question contains 5 questions.

Table answer contains 30 answers, distributed unevenly. Some questions have more answers than the others.

And here's a query to do arrange the answers:

WITH    a AS
        (
        SELECT  a.*, ROW_NUMBER() OVER (PARTITION BY question_id ORDER BY id) AS rn
        FROM    [20090727_pivot].answer a
        ),
        rows AS (
        SELECT  ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    [20090727_pivot].answer a
        WHERE   question_id =
                (
                SELECT  TOP 1 question_id
                FROM    [20090727_pivot].answer a
                GROUP BY
                        question_id
                ORDER BY
                        COUNT(*) DESC
                )
        )
SELECT  (
        SELECT  COALESCE(a.value, '')
        FROM    [20090727_pivot].question q
        LEFT JOIN
                a
        ON      a.rn = rows.rn
                AND a.question_id = q.id
        FOR XML PATH ('td'), TYPE
        ) AS tr
FROM    rows
FOR XML PATH(''), ROOT('table')

XML_F52E2B61-18A1-11d1-B105-00805F49916B
<table><tr><td>Answer 5</td><td>Answer 1</td><td>Answer 2</td><td>Answer 3</td><td>Answer 4</td></tr><tr><td>Answer 10</td><td>Answer 6</td><td>Answer 12</td><td>Answer 8</td><td>Answer 19</td></tr><tr><td>Answer 15</td><td>Answer 11</td><td>Answer 17</td><td>Answer 13</td><td>Answer 24</td></tr><tr><td>Answer 20</td><td>Answer 16</td><td>Answer 22</td><td>Answer 23</td><td>Answer 29</td></tr><tr><td>Answer 25</td><td>Answer 26</td><td/><td/><td/></tr><tr><td>Answer 30</td><td/><td/><td/><td/></tr></table>
1 row fetched in 0.0006s (0.0018s)

Being formatted, the result looks like this:

<table>
  <tr>
    <td>Answer 5</td>
    <td>Answer 1</td>
    <td>Answer 2</td>
    <td>Answer 3</td>
    <td>Answer 4</td>
  </tr>
  <tr>
    <td>Answer 10</td>
    <td>Answer 6</td>
    <td>Answer 12</td>
    <td>Answer 8</td>
    <td>Answer 19</td>
  </tr>
  <tr>
    <td>Answer 15</td>
    <td>Answer 11</td>
    <td>Answer 17</td>
    <td>Answer 13</td>
    <td>Answer 24</td>
  </tr>
  <tr>
    <td>Answer 20</td>
    <td>Answer 16</td>
    <td>Answer 22</td>
    <td>Answer 23</td>
    <td>Answer 29</td>
  </tr>
  <tr>
    <td>Answer 25</td>
    <td>Answer 26</td>
    <td />
    <td />
    <td />
  </tr>
  <tr>
    <td>Answer 30</td>
    <td />
    <td />
    <td />
    <td />
  </tr>
</table>

, and makes this nice table:

Answer 5 Answer 1 Answer 2 Answer 3 Answer 4
Answer 10 Answer 6 Answer 12 Answer 8 Answer 19
Answer 15 Answer 11 Answer 17 Answer 13 Answer 24
Answer 20 Answer 16 Answer 22 Answer 23 Answer 29
Answer 25 Answer 26
Answer 30

This approach can be used even without the XML.

We can just make a rowset that could be parsed on the client side:

WITH    a AS
        (
        SELECT  a.*, ROW_NUMBER() OVER (PARTITION BY question_id ORDER BY id) AS rn
        FROM    [20090727_pivot].answer a
        ),
        rows AS (
        SELECT  ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    [20090727_pivot].answer a
        WHERE   question_id =
                (
                SELECT  TOP 1 question_id
                FROM    [20090727_pivot].answer a
                GROUP BY
                        question_id
                ORDER BY
                        COUNT(*) DESC
                )
        )
SELECT  rows.rn, q.id, a.value
FROM    rows
CROSS JOIN
        [20090727_pivot].question q
LEFT JOIN
        a
ON      a.rn = rows.rn
        AND a.question_id = q.id
ORDER BY
        rows.rn, q.id

rn id value
1 1 Answer 5
1 2 Answer 1
1 3 Answer 2
1 4 Answer 3
1 5 Answer 4
2 1 Answer 10
2 2 Answer 6
2 3 Answer 12
2 4 Answer 8
2 5 Answer 19
3 1 Answer 15
3 2 Answer 11
3 3 Answer 17
3 4 Answer 13
3 5 Answer 24
4 1 Answer 20
4 2 Answer 16
4 3 Answer 22
4 4 Answer 23
4 5 Answer 29
5 1 Answer 25
5 2 Answer 26
5 3
5 4
5 5
6 1 Answer 30
6 2
6 3
6 4
6 5
30 rows fetched in 0.0011s (0.0292s)

This query yields fixed number of column records per row, filling the absent values with NULLs and making it super easy to parse it on the client side.

Written by Quassnoi

July 27th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply