EXPLAIN EXTENDED

How to create fast database queries

Hierarchical queries in PostgreSQL

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and earlier.

For hierarchical queries in PostgreSQL 8.4 and higher, see this article:

In one of the previous articles I wrote about using hierarchical queries in MySQL:

PostgreSQL has a contrib module to implement the same functionality.

However, it's not always possible to install and use contribs. Same is true for procedural languages.

Fortunately, this functionality can be implemented using a plain SQL function.

Let's create a sample table and see how it works:

CREATE TABLE t_hierarchy (
        id INT NOT NULL PRIMARY KEY,
        parent INT NOT NULL,
        value VARCHAR(50) NOT NULL
);

CREATE INDEX ix_hierarchy_parent ON t_hierarchy (parent);

BEGIN TRANSACTION;
INSERT
INTO    t_hierarchy (id, parent, value)
SELECT  s, s / 5, 'Value ' || s
FROM    generate_series(1, 1000) s;

COMMIT;

ANALYZE t_hierarchy;

This table of 1,000 rows has a hierarchical structure, with parent always being equal to id / 5.

This gives us 5 children per node.

To select a tree representation, we need to traverse the tree in the correct order: select the next sibling of a node only after the node's children had been selected.

This can be done using recursive rowset returning functions.

For each node, this function should build a list of its children interleaved with recursive calls to itself that would return the grandchildren etc.

To do this, we will join each list of nodes with a dummy rowset of INT's indicating what should we return on the current step.

If the rowset's value is 1, we return the node; if 2, we invoke the function for the node.

We will create a complex type to return from the function.

This type will contain the whole table row along with level pseudocolumn.

Here's the type and the function:

CREATE TYPE tp_hierarchy AS (node t_hierarchy, level INT);

CREATE OR REPLACE FUNCTION fn_hierarchy_connect_by(INT, INT)
RETURNS SETOF tp_hierarchy
AS
$$
        SELECT  CASE
                WHEN node = 1 THEN
                        (t_hierarchy, $2)::tp_hierarchy
                ELSE
                        fn_hierarchy_connect_by((q.t_hierarchy).id, $2 + 1)
                END
        FROM    (
                SELECT  t_hierarchy, node
                FROM    (
                        SELECT  1 AS node
                        UNION ALL
                        SELECT  2
                        ) nodes,
                        t_hierarchy
                WHERE   parent = $1
                ORDER BY
                        id, node
                ) q;
$$
LANGUAGE 'sql';

And here's what it returns:

SELECT  REPEAT('  ', (q.h).level) || ((q.h).node).id,
        ((q.h).node).parent,
        ((q.h).node).value,
        (q.h).level
FROM    (
        SELECT  fn_hierarchy_connect_by(0, 1) AS h
        ) AS q

?column? parent value level
1 0 Value 1 1
5 1 Value 5 2
25 5 Value 25 3
125 25 Value 125 4
625 125 Value 625 5
626 125 Value 626 5
627 125 Value 627 5
628 125 Value 628 5
629 125 Value 629 5
126 25 Value 126 4
630 126 Value 630 5
631 126 Value 631 5
632 126 Value 632 5
633 126 Value 633 5
634 126 Value 634 5
619 123 Value 619 4
124 24 Value 124 3
620 124 Value 620 4
621 124 Value 621 4
622 124 Value 622 4
623 124 Value 623 4
624 124 Value 624 4
1000 rows fetched in 0.0381s (0.1423s)
Subquery Scan q  (cost=0.00..0.28 rows=1 width=32)
  ->  Result  (cost=0.00..0.26 rows=1 width=0)

The tree structure is just as it should be.

Note the ORDER BY clause inside the function query.

First ordering condition (id in our case) defines what Oracle calls ORDER SIBLINGS BY: that is order in which nodes of same level are returned.

Second ordering condition (node) defines that the nodes and recursive calls are always interleaved. This is required to build correct tree structure.

The filtering condition uses an index on parent, that's why this query is very fast.

Written by Quassnoi

May 29th, 2009 at 11:00 pm

Posted in PostgreSQL

Leave a Reply