PostgreSQL 8.4: preserving order for hierarchical query at EXPLAIN EXTENDED

PostgreSQL 8.4: preserving order for hierarchical query

On Jul 1, 2009, PostgreSQL 8.4 was released.

In this series of articles, I'd like to show how to reimplement some tasks I wrote about in the previous blog posts using new PostgreSQL features.

Previously in the series:

Flattening timespans: PostgreSQL 8.4

Now, let's see how we can implement the hierarchical queries using the new features of PostgreSQL 8.4.

In PostgreSQL 8.3, we had to create a recursive function to do that. If you are still bound to 8.3 or an earlier version, you can read this article to see how to do it:

Hierarchical queries in PostgreSQL

In 8.4, we have recursive CTE's (common table expressions).

Let's create a sample hierarchical table and see how can we query it:

CREATE TABLE t_hierarchy (
id INT NOT NULL PRIMARY KEY,
parent INT NOT NULL,
value VARCHAR(50) NOT NULL
);

CREATE INDEX ix_hierarchy_parent ON t_hierarchy (parent);

BEGIN TRANSACTION;
INSERT
INTO    t_hierarchy (id, parent, value)
SELECT  s, s / 5, 'Value ' || s
FROM    generate_series(1, 1000) s;

COMMIT;

ANALYZE t_hierarchy;

There are 1,000 rows in the table, each row being a child to a row with id 5 times as low as the current one (i. e. rows 500, 501, 502, 503 and 504 are all children of row 100)

In the previous versions, we had to create a type and a recursive function:

CREATE TYPE tp_hierarchy AS (node t_hierarchy, level INT);

CREATE OR REPLACE FUNCTION fn_hierarchy_connect_by(INT, INT)
RETURNS SETOF tp_hierarchy
AS
$$
SELECT  CASE
WHEN node = 1 THEN
(t_hierarchy, $2)::tp_hierarchy
ELSE
fn_hierarchy_connect_by((q.t_hierarchy).id, $2 + 1)
END
FROM    (
SELECT  t_hierarchy, node
FROM    (
SELECT  1 AS node
UNION ALL
SELECT  2
) nodes,
t_hierarchy
WHERE   parent = $1
ORDER BY
id, node
) q;
$$
LANGUAGE 'sql';

This function gave us the following results:

SELECT  REPEAT('  ', (q.h).level) || ((q.h).node).id,
((q.h).node).parent,
((q.h).node).value,
(q.h).level
FROM    (
SELECT  fn_hierarchy_connect_by(0, 1) AS h
) AS q

?column?	parent	value	level
1	0	Value 1	1
5	1	Value 5	2
25	5	Value 25	3
125	25	Value 125	4
625	125	Value 625	5
626	125	Value 626	5
627	125	Value 627	5
628	125	Value 628	5
629	125	Value 629	5
126	25	Value 126	4
630	126	Value 630	5
631	126	Value 631	5
619	123	Value 619	4

124	24	Value 124	3
620	124	Value 620	4
621	124	Value 621	4
622	124	Value 622	4
623	124	Value 623	4
624	124	Value 624	4
1000 rows fetched in 0.0380s (0.1636s)

Subquery Scan q  (cost=0.00..0.28 rows=1 width=32)
  ->  Result  (cost=0.00..0.26 rows=1 width=0)

It runs quite fast, only 160 ms, and returns the rows in correct order.

In the current version, we don't need a function. Instead, we can use a recursive CTE:

WITH RECURSIVE q AS
(
SELECT  h, 1 AS level
FROM    t_hierarchy h
WHERE   parent = 0
UNION ALL
SELECT  hi, q.level + 1 AS level
FROM    q
JOIN    t_hierarchy hi
ON      hi.parent = (q.h).id
)
SELECT  REPEAT('  ', level) || (q.h).id,
(q.h).parent,
(q.h).value,
level
FROM    q

?column?	parent	value	level
4	0	Value 4	1
3	0	Value 3	1
2	0	Value 2	1
1	0	Value 1	1
5	1	Value 5	2
6	1	Value 6	2
7	1	Value 7	2
8	1	Value 8	2
9	1	Value 9	2
10	2	Value 10	2
11	2	Value 11	2

998	199	Value 998	5
999	199	Value 999	5
1000	200	Value 1000	5
1000 rows fetched in 0.0386s (0.0166s)

CTE Scan on q  (cost=313.11..387.96 rows=2495 width=36)
  CTE q
    ->  Recursive Union  (cost=0.00..313.11 rows=2495 width=36)
          ->  Index Scan using ix_hierarchy_parent on t_hierarchy h  (cost=0.00..8.34 rows=5 width=32)
                Index Cond: (parent = 0)
          ->  Hash Join  (cost=1.63..25.49 rows=249 width=36)
                Hash Cond: (hi.parent = (q.h).id)
                ->  Seq Scan on t_hierarchy hi  (cost=0.00..17.00 rows=1000 width=36)
                ->  Hash  (cost=1.00..1.00 rows=50 width=36)
                      ->  WorkTable Scan on q  (cost=0.00..1.00 rows=50 width=36)

This query is very fast (only 40 ms).

However, it does not preserve correct hierarchical order.

This is because the resursive CTE does not return the rows in recursive order. Rather, it first select all rows for the current recursion level, and then executes the CTE for each of the rows, UNION'ing the result.

To select rows in correct order, we need some field we could sort on. Since we need a lexicographic sort here, no base type can be easily used to do such a sort.

Fortunately, PostgreSQL offers ARRAY datatype which is lexicographically comparable and hence is just what we need.

We will add an additional int[] field, called breadcrumb, to the results of the CTE, which will accumulate the parent ids in an ARRAY of integers.

Then we will just sort on this field.

Here's the query:

WITH RECURSIVE q AS
(
SELECT  h, 1 AS level, ARRAY[id] AS breadcrumb
FROM    t_hierarchy h
WHERE   parent = 0
UNION ALL
SELECT  hi, q.level + 1 AS level, breadcrumb || id
FROM    q
JOIN    t_hierarchy hi
ON      hi.parent = (q.h).id
)
SELECT  REPEAT('  ', level) || (q.h).id,
(q.h).parent,
(q.h).value,
level,
breadcrumb::VARCHAR AS path
FROM    q
ORDER BY
breadcrumb

?column?	parent	value	level	path
1	0	Value 1	1	{1}
5	1	Value 5	2	{1,5}
25	5	Value 25	3	{1,5,25}
125	25	Value 125	4	{1,5,25,125}
625	125	Value 625	5	{1,5,25,125,625}
626	125	Value 626	5	{1,5,25,125,626}
627	125	Value 627	5	{1,5,25,125,627}
628	125	Value 628	5	{1,5,25,125,628}
629	125	Value 629	5	{1,5,25,125,629}
126	25	Value 126	4	{1,5,25,126}
630	126	Value 630	5	{1,5,25,126,630}
631	126	Value 631	5	{1,5,25,126,631}

124	24	Value 124	3	{4,24,124}
620	124	Value 620	4	{4,24,124,620}
621	124	Value 621	4	{4,24,124,621}
622	124	Value 622	4	{4,24,124,622}
623	124	Value 623	4	{4,24,124,623}
624	124	Value 624	4	{4,24,124,624}
1000 rows fetched in 0.0648s (0.0270s)

Sort  (cost=547.44..553.68 rows=2495 width=68)
  Sort Key: q.breadcrumb
  CTE q
    ->  Recursive Union  (cost=0.00..319.34 rows=2495 width=72)
          ->  Index Scan using ix_hierarchy_parent on t_hierarchy h  (cost=0.00..8.34 rows=5 width=36)
                Index Cond: (parent = 0)
          ->  Hash Join  (cost=1.63..26.11 rows=249 width=72)
                Hash Cond: (hi.parent = (q.h).id)
                ->  Seq Scan on t_hierarchy hi  (cost=0.00..17.00 rows=1000 width=40)
                ->  Hash  (cost=1.00..1.00 rows=50 width=68)
                      ->  WorkTable Scan on q  (cost=0.00..1.00 rows=50 width=68)
  ->  CTE Scan on q  (cost=0.00..87.33 rows=2495 width=68)

This query is also almost instant and returns the hierarchical data from the CTE in correct order.

As a bonus, we get what Oracle call SYS_CONNECT_BY_PATH (a complete descendancy chain for each row) for free.

This new feature is very nice and a real boon for every PostgreSQL developer.

Written by Quassnoi

July 17th, 2009 at 11:00 pm

Posted in PostgreSQL

3 Responses to 'PostgreSQL 8.4: preserving order for hierarchical query'

Subscribe to comments with RSS

How about case when id set for some level is {1,2,…,10}. Result of order by breadcrump will be 1,10,2…?

Alessio

5 Feb 13 at 20:56
@Alessio: no.

Quassnoi

14 Mar 13 at 00:26
This is exellent ! Thank you for putting this together. Very helpful.

SkipDigit

20 Dec 19 at 03:44

EXPLAIN EXTENDED