Longest common prefix: Oracle
Comments enabled. I *really* need your comment
Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
This article will describe how to do it in Oracle.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.
Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL
clause to solve this task.
Some time ago I wrote an article on how to use this clause to implement GROUP_CONCAT
(an aggregate function in MySQL to concatenate strings):
We will use the same approach and will just a little modify this query to strip the largest common prefix of the strings before concatenating them.
MODEL
clause allows to interpret the resultset as an Excel workbook and modify it using rules (similar to Excel formulae).
We will to do the following with the resultset:
- In a loop, take the prefix of the first string, and compare it to the prefix of the same length of all other strings, using the loop iteration number to calculate prefix length.We need to stop whenever there is a sinlge string that does not start with the prefix.
This is easy to do using
ITERATE
clause: it allows the rules to be executed multiple times and stops wheneverUNTIL
condition is met:- Put the non-zero value (prefix length) into the special dynamic column called
match
if the prefixes don't match, or put a zero if the prefixes do match. - Calculate the maximum for the
match
over all rows but the first and put it into the first row. This value can be referenced asmatch[1]
- Check the value of
match[1]
in theUNTIL
clause and break the loop if it's non-zero
When the loop completes,
match
will hold the length of the longest common prefix. - Put the non-zero value (prefix length) into the special dynamic column called
- Concatenate the strings, using the calculated prefix length. See the detailed description of the algorithm in this article:
GROUP_CONCAT
in Oracle 10g
And here's the query:
WITH data AS ( SELECT 1 AS id, '3536757616' AS ref FROM dual UNION ALL SELECT 1 AS id, '3536757617' AS ref FROM dual UNION ALL SELECT 1 AS id, '3536757618' AS ref FROM dual UNION ALL SELECT 2 AS id, '3536757628' AS ref FROM dual UNION ALL SELECT 2 AS id, '3536757629' AS ref FROM dual UNION ALL SELECT 2 AS id, '3536757630' AS ref FROM dual ) SELECT id, group_concat FROM ( SELECT * FROM ( SELECT id, CAST(ref AS VARCHAR2(4000)) AS ref, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ref) AS rn FROM data ) MODEL PARTITION BY (id) DIMENSION BY (rn) MEASURES (ref, ref group_concat, 0 match) RULES SEQUENTIAL ORDER ITERATE (1000) UNTIL (match[1] > 0) ( match[rn > 1] = CASE WHEN SUBSTR(ref[CV()], 1, ITERATION_NUMBER + 1) = SUBSTR(ref[1], 1, ITERATION_NUMBER + 1) THEN 0 ELSE ITERATION_NUMBER END, group_concat[rn = 1] = ref[1], group_concat[rn > 1] = group_concat[CV() - 1] || '/' || SUBSTR(ref[CV()], ITERATION_NUMBER + 1), group_concat[1] = MAX(group_concat)[rn > 1], match[1] = MAX(match)[rn > 1] ) ) WHERE rn = 1
ID | GROUP_CONCAT | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3536757616/7/8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 3536757628/29/30 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 rows fetched in 0.0002s (0.0026s) |
SELECT STATEMENT VIEW SQL MODEL ORDERED VIEW WINDOW SORT VIEW UNION-ALL FAST DUAL FAST DUAL FAST DUAL FAST DUAL FAST DUAL FAST DUAL
To be continued.