EXPLAIN EXTENDED

How to create fast database queries

Longest common prefix: Oracle

Comments enabled. I *really* need your comment

Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:

This article will describe how to do it in Oracle.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.

Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL clause to solve this task.

Some time ago I wrote an article on how to use this clause to implement GROUP_CONCAT (an aggregate function in MySQL to concatenate strings):

We will use the same approach and will just a little modify this query to strip the largest common prefix of the strings before concatenating them.

MODEL clause allows to interpret the resultset as an Excel workbook and modify it using rules (similar to Excel formulae).

We will to do the following with the resultset:

  1. In a loop, take the prefix of the first string, and compare it to the prefix of the same length of all other strings, using the loop iteration number to calculate prefix length.We need to stop whenever there is a sinlge string that does not start with the prefix.

    This is easy to do using ITERATE clause: it allows the rules to be executed multiple times and stops whenever UNTIL condition is met:

    1. Put the non-zero value (prefix length) into the special dynamic column called match if the prefixes don't match, or put a zero if the prefixes do match.
    2. Calculate the maximum for the match over all rows but the first and put it into the first row. This value can be referenced as match[1]
    3. Check the value of match[1] in the UNTIL clause and break the loop if it's non-zero

    When the loop completes, match will hold the length of the longest common prefix.

  2. Concatenate the strings, using the calculated prefix length. See the detailed description of the algorithm in this article: GROUP_CONCAT in Oracle 10g

And here's the query:

WITH    data AS
(
SELECT  1 AS id, '3536757616' AS ref
FROM    dual
UNION ALL
SELECT  1 AS id, '3536757617' AS ref
FROM    dual
UNION ALL
SELECT  1 AS id, '3536757618' AS ref
FROM    dual
UNION ALL
SELECT  2 AS id, '3536757628' AS ref
FROM    dual
UNION ALL
SELECT  2 AS id, '3536757629' AS ref
FROM    dual
UNION ALL
SELECT  2 AS id, '3536757630' AS ref
FROM    dual
)
SELECT  id, group_concat
FROM    (
SELECT  *
FROM    (
SELECT  id, CAST(ref AS VARCHAR2(4000)) AS ref, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ref) AS rn
FROM    data
)
MODEL
PARTITION BY
(id)
DIMENSION BY
(rn)
MEASURES
(ref, ref group_concat, 0 match)
RULES SEQUENTIAL ORDER ITERATE (1000) UNTIL (match[1] > 0)
(
match[rn > 1] = CASE WHEN SUBSTR(ref[CV()], 1, ITERATION_NUMBER + 1) = SUBSTR(ref[1], 1, ITERATION_NUMBER + 1) THEN 0 ELSE ITERATION_NUMBER END,
group_concat[rn = 1] = ref[1],
group_concat[rn > 1] = group_concat[CV() - 1] || '/' || SUBSTR(ref[CV()], ITERATION_NUMBER + 1),
group_concat[1] = MAX(group_concat)[rn > 1],
match[1] = MAX(match)[rn > 1]
)
)
WHERE   rn = 1

ID GROUP_CONCAT
1 3536757616/7/8
2 3536757628/29/30
2 rows fetched in 0.0002s (0.0026s)
SELECT STATEMENT 
 VIEW 
  SQL MODEL ORDERED
   VIEW 
    WINDOW SORT
     VIEW 
      UNION-ALL 
       FAST DUAL 
       FAST DUAL 
       FAST DUAL 
       FAST DUAL 
       FAST DUAL 
       FAST DUAL 

To be continued.

Written by Quassnoi

June 6th, 2009 at 11:00 pm

Posted in Oracle

Leave a Reply