Recursive CTE’s: PostgreSQL
Comments enabled. I *really* need your comment
In the previous article on recursive CTE's in SQL Server I demonstrated that they are not really set-based.
SQL Server implements the recursive CTE's syntax, but forbids all operations that do not distribute over UNION ALL
, and each recursive step sees only a single record from the previous step.
Now, let's check the same operations in PostgreSQL 8.4.
To do this, we well write a query that selects only the very first branch of a tree: that is, each item would be the first child of its parent. To do this, we should select the item that would be the first child of the root, the select the first child of that item etc.
This is a set-based operation.
Oracle's CONNECT BY
syntax, despite being set-based, offers some limited set-based capabilities: you can use ORDER SIBLINGS BY
clause to define the order in which the siblings are returned. However, this would require some additional work to efficiently return only the first branch.
In a true set-based system, this is much more simple.
Let's create a sample table:
CREATE TABLE t_recursive ( id INT NOT NULL PRIMARY KEY, parent INT NOT NULL, orderer INT NOT NULL, data VARCHAR(100) NOT NULL ); CREATE INDEX ix_recursive_parent_orderer ON t_recursive (parent, orderer); SELECT SETSEED(0.20091123); INSERT INTO t_recursive SELECT s, (s - 1) / 5, FLOOR(RANDOM() * 10000), 'Item ' || s FROM generate_series(1, 1000000) s;
This table contains 1,000,000 records and implements an adjacency tree hierarchy.
Each item has at most 5 children and a randomly filled column, orderer
, which defines its order along its siblings.
Now, let's try to make a query that would select the first branch, the order being defined by the value provided in orderer
.
To do this, we should make the anchor step to return the first child of the root item, and the recursive steps to return the first child of the previously returned item.
To return the first child of a given parent in the orderer
order, we can use PostgreSQL's DISTINCT ON
functionality.
This is the same as DISTINCT
, but can return the whole row rather than a column DISTINCT
is being applied to.
If two rows share the value of the column DISTINCT ON
is being applied to, only one of the rows will be returned. Which row will it be is defined by the ORDER BY
clause.
This solves the problems like return a single row that holds group-wise maximum
.
This query:
SELECT DISTINCT ON (parent) * FROM t_recursive ORDER BY parent, orderer
is the same as this one:
SELECT (q.r).* FROM ( SELECT r, ROW_NUMBER() OVER (PARTITION BY grouper ORDER BY orderer) AS rn FROM t_recursive r ) q WHERE rn = 1
, but more legible and in some cases more efficient.
To apply this clause to the task we need, we should just use DISTINCT ON (parent)
in both anchor and recursive parts:
WITH RECURSIVE rows AS ( SELECT * FROM ( SELECT DISTINCT ON (parent) * FROM t_recursive WHERE parent = 0 ORDER BY parent, orderer ) q UNION ALL SELECT * FROM ( SELECT DISTINCT ON (c.parent) c.* FROM rows r JOIN t_recursive c ON c.parent = r.id ORDER BY c.parent, c.orderer ) q2 ) SELECT * FROM rows
id | parent | orderer | data | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 0 | 1686 | Item 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 | 3 | 3370 | Item 19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
98 | 19 | 42 | Item 98 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
492 | 98 | 1762 | Item 492 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2464 | 492 | 2295 | Item 2464 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12322 | 2464 | 2050 | Item 12322 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
61614 | 12322 | 768 | Item 61614 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
308074 | 61614 | 1925 | Item 308074 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 rows fetched in 0.0004s (0.0042s) |
CTE Scan on rows (cost=3188.19..3207.83 rows=982 width=230) CTE rows -> Recursive Union (cost=0.00..3188.19 rows=982 width=230) -> Unique (cost=0.00..15.46 rows=2 width=23) -> Index Scan using ix_recursive_parent_orderer on t_recursive (cost=0.00..15.45 rows=5 width=23) Index Cond: (parent = 0) -> Unique (cost=313.84..314.33 rows=98 width=23) -> Sort (cost=313.84..314.08 rows=98 width=23) Sort Key: c.parent, c.orderer -> Nested Loop (cost=0.00..310.60 rows=98 width=23) -> WorkTable Scan on rows r (cost=0.00..0.40 rows=20 width=4) -> Index Scan using ix_recursive_parent_orderer on t_recursive c (cost=0.00..15.45 rows=5 width=23) Index Cond: (c.parent = r.id)
This gives us a single branch containing just the records we need: each one being the first child of its parent.
Now, what if we wanted to return a list of records, each being the second child to its parent?
Since each recursive part takes only one record as an input (and returns one record as an output), we can just replace DISTINCT ON
(which returns the first child of each group) with a OFFSET 1 LIMIT 1
:
WITH RECURSIVE rows AS ( SELECT * FROM ( SELECT * FROM t_recursive WHERE parent = 0 ORDER BY orderer OFFSET 1 LIMIT 1 ) q UNION ALL SELECT * FROM ( SELECT c.* FROM rows r JOIN t_recursive c ON c.parent = r.id ORDER BY c.orderer OFFSET 1 LIMIT 1 ) q2 ) SELECT * FROM rows
id | parent | orderer | data | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 2540 | Item 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 1 | 3405 | Item 6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
33 | 6 | 2884 | Item 33 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
166 | 33 | 3084 | Item 166 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
833 | 166 | 1848 | Item 833 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4169 | 833 | 993 | Item 4169 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20850 | 4169 | 3126 | Item 20850 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
104251 | 20850 | 3021 | Item 104251 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
521256 | 104251 | 5492 | Item 521256 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 rows fetched in 0.0005s (0.0043s) |
CTE Scan on rows (cost=1564.44..1564.66 rows=11 width=230) CTE rows -> Recursive Union (cost=3.09..1564.44 rows=11 width=230) -> Limit (cost=3.09..6.18 rows=1 width=23) -> Index Scan using ix_recursive_parent_orderer on t_recursive (cost=0.00..15.45 rows=5 width=23) Index Cond: (parent = 0) -> Limit (cost=155.79..155.79 rows=1 width=23) -> Sort (cost=155.79..155.91 rows=49 width=23) Sort Key: c.orderer -> Nested Loop (cost=0.00..155.30 rows=49 width=23) -> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=4) -> Index Scan using ix_recursive_parent_orderer on t_recursive c (cost=0.00..15.45 rows=5 width=23) Index Cond: (c.parent = r.id)
This query returns a whole branch of the items that are second children to their parents.
Both these queries are available in SQL Server 2005: despite the fact they do not distribute over UNION ALL
, they can be rewritten using a ROW_NUMBER()
.
As was shown in the previous article, for some strange reason, SQL Server 2005 does not forbid this clause, but rather implements it in the incorrect way. This, however, allows writing the query we're after.
Now, let's make a query that would recursively return the first two children.
This requires a set-based solution, since first two children can come from different parents.
A grandchild that is third to its grandfather won't count, even if it was a first child to the first child of the grandfather.
A first child to the third child of the grandfather won't count either, even if it was the first grandchild to its grandfather.
To do this right, on each step the query should accept 2 items, return 2 items and process the items accepted at once.
This is also easy in a true set-based recursive CTE:
WITH RECURSIVE rows AS ( SELECT * FROM ( SELECT * FROM t_recursive r WHERE parent = 0 ORDER BY orderer LIMIT 2 ) q UNION ALL SELECT * FROM ( SELECT c.* FROM rows r JOIN t_recursive c ON c.parent = r.id ORDER BY c.orderer LIMIT 2 ) q ) SELECT * FROM rows r
id | parent | orderer | data | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 0 | 1686 | Item 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 0 | 2540 | Item 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 1 | 2181 | Item 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 | 3 | 3370 | Item 19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
98 | 19 | 42 | Item 98 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
99 | 19 | 1351 | Item 99 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
497 | 99 | 1245 | Item 497 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
496 | 99 | 1255 | Item 496 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2486 | 497 | 205 | Item 2486 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2484 | 496 | 362 | Item 2484 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12431 | 2486 | 29 | Item 12431 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12423 | 2484 | 311 | Item 12423 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
62119 | 12423 | 1113 | Item 62119 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
62120 | 12423 | 1121 | Item 62120 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
310602 | 62120 | 341 | Item 310602 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
310605 | 62120 | 468 | Item 310605 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 rows fetched in 0.0007s (0.0043s) |
CTE Scan on rows r (cost=3122.65..3123.09 rows=22 width=230) CTE rows -> Recursive Union (cost=0.00..3122.65 rows=22 width=230) -> Limit (cost=0.00..6.18 rows=2 width=23) -> Index Scan using ix_recursive_parent_orderer on t_recursive r (cost=0.00..15.45 rows=5 width=23) Index Cond: (parent = 0) -> Limit (cost=311.58..311.58 rows=2 width=23) -> Sort (cost=311.58..311.82 rows=98 width=23) Sort Key: c.orderer -> Nested Loop (cost=0.00..310.60 rows=98 width=23) -> WorkTable Scan on rows r (cost=0.00..0.40 rows=20 width=4) -> Index Scan using ix_recursive_parent_orderer on t_recursive c (cost=0.00..15.45 rows=5 width=23) Index Cond: (c.parent = r.id)
This query takes all children of the previous set and returns the first two of their descendants.
We see that PostgreSQL, unlike SQL Server, implements the recursive CTE's in truly set-based way.
The recursive part of the query can accept a set on input, return a set on output and do the set-based operations on the set received.