EXPLAIN EXTENDED

How to create fast database queries

GROUP_CONCAT in Oracle 10g

with one comment

MySQL has a nice aggregate function called GROUP_CONCAT, which concatenates all strings in a group in given order, separating them with a given separator.

In one of the previous articles, I wrote about emulating this function in PostgreSQL.

Now, we'll try to emulate this function in Oracle 10g.

There are numerous solutions using Oracle's hierarchical function SYS_CONNECT_BY_PATH, but they're bad in the following ways:

  • Not all queries can be rewritten using CONNECT BY
  • Even if they can, CONNECT BY performs poorly on some conditions

Instead, we will emulate this function using Oracle's MODEL clause.

MODEL clause is a nice feature of Oracle 10g that allows you to work with a resultset as if it were an Excel workbook. You can basically do things like this:

SELECT  value, FORMULA('=R[-3]C[-1]') AS value_three_rows_back
FROM    mytable

Note: the code above is not a real query, this is just a human-readable illustration of what can be done, to give you the idea. The real syntax is more complex.

Now, let's generate some sample data. I will not create and fill real tables, instead I'll just use a generated resultset to demonstrate this solution's ability to work on subqueries as well.

WITH q AS (
SELECT  grouper, val,
ROW_NUMBER() OVER (PARTITION BY grouper ORDER BY rno) AS rn
FROM    (
SELECT  'test ' || level AS val,
FLOOR((level - 1) / 5) AS grouper,
rownum AS rno
FROM    dual
CONNECT BY
level <= 100
                ) qi
        )
SELECT  *
FROM    q
GROUPER VAL RN
0 test 1 1
0 test 2 2
0 test 3 3
0 test 4 4
0 test 5 5
1 test 6 1
1 test 7 2
18 test 94 4
18 test 95 5
19 test 96 1
19 test 97 2
19 test 98 3
19 test 99 4
19 test 100 5

We have a set of grouped and ordered text values which we need to join using a given separator.

Here's the Excel algorithm to do it. For the principle to be better understood, we are treating our resultset as an Excel workbook.

  1. Partition the result into several worksheets, putting values from each group into the first and the second columns of the group's worksheet.
    A B C
    text 1 text 1
    text 2 text 2
    text 3 text 3
    text 4 text 4
    text 5 text 5
    A B C
    text 6 text 6
    text 7 text 7
    text 8 text 8
    text 9 text 9
    text 10 text 10
  2. On each worksheet, put a concatenation formula to all but the first row in the second column:
    A B C
    text 1 text1
    text 2 =R[-1]C & ', ' & RC[-1]
    text 3 =R[-1]C & ', ' & RC[-1]
    text 4 =R[-1]C & ', ' & RC[-1]
    text 5 =R[-1]C & ', ' & RC[-1]
  3. Finally, put a formula to each row of the third column that check if it's a last row of the worksheet:
    A B C
    text 1 text1 =ISBLANK(R[1]C[-1])
    text 2 =R[-1]C & ', ' & RC[-1] =ISBLANK(R[1]C[-1])
    text 3 =R[-1]C & ', ' & RC[-1] =ISBLANK(R[1]C[-1])
    text 4 =R[-1]C & ', ' & RC[-1] =ISBLANK(R[1]C[-1])
    text 5 =R[-1]C & ', ' & RC[-1] =ISBLANK(R[1]C[-1])

From all this, we get the following values:

A B C
text 1 text1 1
text 2 text1, text2 1
text 3 text1, text2, text3 1
text 4 text1, text2, text3, text4 1
text 5 text1, text2, text3, text4, text5 0

Now, we just need to select the last row from each worksheet (it's marked with a FALSE in the third column).

So, how do we do it in Oracle?

We use the Oracle's MODEL clause.

PARTITION clause in it defines worksheets, DIMENSION BY defines row numbers, MEASURES defines columns.

If Oracle encounters a row with grouper = 5 and rn = 3, it puts the corresponding values into the 5th worksheet on the 3rd row: val to the first column (called val), val again to the second column (called group_concat), and a 0 to the third one (called mark).

RULES UPDATE define the actions we described in the steps above. They are quite self-explanatory, and can be seen in detail in the Excel tables above.

Finally, we select group_concat (which contains the concatenated values) from the rows that have mark = 1 (that is, last row of each worksheet).

Here's the query and the result:

WITH q AS (
SELECT  qi.*,
ROW_NUMBER() OVER (PARTITION BY grouper ORDER BY rno) AS rn
FROM    (
SELECT  'test ' || level AS val,
FLOOR((level - 1) / 5) AS grouper,
rownum AS rno
FROM    dual
CONNECT BY
level <= 100
                ) qi
        )
SELECT  *
FROM    (
        SELECT  *
        FROM    q
        MODEL
        PARTITION BY
                (grouper)
        DIMENSION BY
                (rn)
        MEASURES
                (val, val AS group_concat, 0 AS mark)
        RULES UPDATE (
                group_concat[rn > 1] ORDER BY rn =  group_concat[CV() - 1] || ', ' || val[CV()],
mark[ANY] ORDER BY rn = PRESENTV(mark[CV() + 1], 0, 1)
)
)
WHERE   mark = 1
ORDER BY
grouper
GROUPER GROUP_CONCAT
0 test 1, test 2, test 3, test 4, test 5
1 test 6, test 7, test 8, test 9, test 10
2 test 11, test 12, test 13, test 14, test 15
3 test 16, test 17, test 18, test 19, test 20
4 test 21, test 22, test 23, test 24, test 25
5 test 26, test 27, test 28, test 29, test 30
6 test 31, test 32, test 33, test 34, test 35
7 test 36, test 37, test 38, test 39, test 40
8 test 41, test 42, test 43, test 44, test 45
9 test 46, test 47, test 48, test 49, test 50
10 test 51, test 52, test 53, test 54, test 55
11 test 56, test 57, test 58, test 59, test 60
12 test 61, test 62, test 63, test 64, test 65
13 test 66, test 67, test 68, test 69, test 70
14 test 71, test 72, test 73, test 74, test 75
15 test 76, test 77, test 78, test 79, test 80
16 test 81, test 82, test 83, test 84, test 85
17 test 86, test 87, test 88, test 89, test 90
18 test 91, test 92, test 93, test 94, test 95
19 test 96, test 97, test 98, test 99, test 100

The query returns the records concatenated group-wise, just like MySQL's GROUP_CONCAT does.

Written by Quassnoi

April 5th, 2009 at 11:00 pm

Posted in Oracle

One Response to 'GROUP_CONCAT in Oracle 10g'

Subscribe to comments with RSS

  1. This is awesome. Brilliant!

    kftgr

    25 Jan 13 at 05:34

Leave a Reply