Aggregate concatenation
Comments enabled. I *really* need your comment
Aggregate concatenation functions help creating a concatenated list out of a recordset. Useful for reports, hierarchical trees, etc.
MySQL supplies GROUP_CONCAT
for this purpose. SYS_CONNECT_BY PATH
and FOR XML
can be used in Oracle and MS SQL.
In PostgreSQL, we cannot use these tricks, but we can create our own aggregate function. And this function will also accept two more extremely useful parameters: DELIMITER
and IS_DISTINCT
.
DELIMITER
is self-explanatory: it will be used to separate the values. IS_DISTINCT
, on the other hard, is more tricky: it will be evaluated on each step, and will be used to decide if this very value should be filtered out.
Let's create the functions and see how they work:
CREATE TYPE "20090304_groupconcat".tp_concat AS (data TEXT[], delimiter TEXT); CREATE OR REPLACE FUNCTION "20090304_groupconcat".group_concat_iterate(_state "20090304_groupconcat".tp_concat, _value TEXT, delimiter TEXT, is_distinct boolean) RETURNS "20090304_groupconcat".tp_concat AS $BODY$ SELECT CASE WHEN $1 IS NULL THEN ARRAY[$2] WHEN $4 AND $1.data @> ARRAY[$2] THEN $1.data ELSE $1.data || $2 END, $3 $BODY$ LANGUAGE 'sql' VOLATILE; CREATE OR REPLACE FUNCTION "20090304_groupconcat".group_concat_finish(_state "20090304_groupconcat".tp_concat) RETURNS text AS $BODY$ SELECT array_to_string($1.data, $1.delimiter) $BODY$ LANGUAGE 'sql' VOLATILE; CREATE AGGREGATE "20090304_groupconcat".group_concat(text, text, boolean) (SFUNC = "20090304_groupconcat".group_concat_iterate, STYPE = "20090304_groupconcat".tp_concat, FINALFUNC = "20090304_groupconcat".group_concat_finish); CREATE TABLE "20090304_groupconcat".t_exception (exp_id INT NOT NULL PRIMARY KEY, exp_word TEXT NOT NULL);
As you can see, we utilize a custom datatype here. We need to store DELIMITER
along with intermediate aggregated data, as it's the only data FINALFUNC
has access to. We also use array operations to store data and search for duplicates.
IS_DISTINCT
operation is best illustrated with this query:
SELECT "20090303_groupconcat".group_concat(val, ', ', val = 'grouped') FROM ( SELECT 'grouped' AS val UNION ALL SELECT 'grouped' UNION ALL SELECT 'ungrouped' UNION ALL SELECT 'ungrouped' ) g;
group_concat |
---|
grouped, ungrouped, ungrouped |
The aggregate groups all records that have value grouped
and leaves all other records as is.
When using this aggregate against the indexed column on large datasets, it's better to use DISTINCT
clause in an uncorrelated subquery. This aggregate will not utilize any indexes on the aggregated column, while DISTINCT
will:
SELECT "20090304_groupconcat".group_concat(exp_word, ', ', TRUE) FROM ( SELECT DISTINCT exp_word FROM t_exception ) e;
group_concat |
---|
asper, gener, liber, miser, puer, socer, tener, vesper |
Time: 1213.252 ms |
When there is no index or it's not used, the aggregate is of same performance as DISTINCT
clause:
SET ENABLE_INDEXSCAN=OFF; SELECT "20090304_groupconcat".group_concat(exp_word, ', ', TRUE) FROM ( SELECT DISTINCT exp_word FROM "20090304_groupconcat".t_exception ) e; SET ENABLE_INDEXSCAN=ON;
group_concat |
---|
asper, gener, liber, miser, puer, socer, tener, vesper |
Time: 14391.568 ms |
SET ENABLE_INDEXSCAN=OFF; SELECT "20090304_groupconcat".group_concat(exp_word, ', ', TRUE) FROM "20090304_groupconcat".t_exception; SET ENABLE_INDEXSCAN=ON;
group_concat |
---|
puer, socer, vesper, gener, asper, liber, miser, tener |
Time: 14324.554 ms |
One final note: the aggregate outputs values in order it receives the first instance of each value.
It you need any specific order, use ORDER BY
in an uncorrelated subquery.