PostgreSQL: using recursive functions in nested sets
Comments enabled. I *really* need your comment
In the previous article, I discussed a way to improve nested sets model in PostgreSQL.
The approach shown in the article used an analytical function to filter all immediate children of a node in a recursive CTE.
This allowed us to filter a node's children on the level more efficiently than RTree or BTree approaches do (since they rely on COUNT(*)
).
That solution was pure SQL and it was quite fast, but not optimal.
The drawback of that solution is that it still needs to fetch all children of a node to apply the analytic function to them. This can take much time for the top of the hierarchy. And since the top of the hierarchy is what is what usually shown at the start page, it would be very nice to improve this query yet a little more.
We can do it by creating and using a simple recursive SQL function. This function does not even require PL/pgSQL to be enabled.
Let's create a sample table:
Table creation details
If we run the query introduced in the previous article to fetch all children up to level 2 from a really top node, we get the following results:
WITH RECURSIVE q AS ( SELECT id, lft, rgt, 1 AS lvl FROM t_hierarchy WHERE id = 1 UNION ALL SELECT * FROM ( SELECT DISTINCT ON (MAX(hc.rgt) OVER (PARTITION BY q.id ORDER BY hc.lft)) hc.id, hc.lft, hc.rgt, lvl + 1 FROM q JOIN t_hierarchy hc ON hc.lft > q.lft AND hc.lft < q.rgt WHERE lvl <= 2 ORDER BY MAX(hc.rgt) OVER (PARTITION BY q.id ORDER BY hc.lft), hc.lft ) q2 ) SELECT * FROM q
This runs for almost 15 seconds: too much.
This can be improved by exploiting these two properties of the nested sets model:

The first immediate child of a node is the node holding the first
lft
next to the node'slft

The next sibling of a node is the node holding the first
lft
next to the node'srgt
If we recursively traverse through the nodes, we can find the first child as well as all of its siblings. This is enough to build a hierarchy, and level filter can be implemented merely by limiting the recursion depth.
However, recursive CTE's only allow one recursion level. We cannot nest the WITH
clause.
To work around that, we can use PostgreSQL's ability to run setreturning functions recursively. We will use the functionbased recursion to iterate the parentchild axis, and the CTEbased recursion to iterate siblings axis.
We need to create a function that would take a node's id on input and return a set of its children on output, with the function recursively applied to each of the children. To find a set of children, we will implement a recursive CTE that finds the first child in the anchor part and the next sibling in the recursive part.
Here's the function:
CREATE OR REPLACE FUNCTION fn_get_children(id INT, level INT) RETURNS SETOF INT[] AS $$ WITH RECURSIVE q AS ( SELECT (hc).id, (hc).lft, (hc).rgt, prgt FROM ( SELECT ( SELECT hc FROM t_hierarchy hc WHERE hc.lft > hp.lft AND hc.lft < hp.rgt ORDER BY hc.lft LIMIT 1 ) hc, rgt AS prgt FROM t_hierarchy hp WHERE hp.id = $1 ) q2 UNION ALL SELECT (hc).id, (hc).lft, (hc).rgt, prgt FROM ( SELECT ( SELECT hc FROM t_hierarchy hc WHERE hc.lft > q.rgt AND hc.lft < q.prgt ORDER BY hc.lft LIMIT 1 ) hc, prgt FROM q WHERE q.lft IS NOT NULL ) q2 ) SELECT CASE which WHEN 1 THEN ARRAY[q.id, $2] ELSE fn_get_children(q.id, $2  1) END FROM ( VALUES (1), (2) ) vals(which) CROSS JOIN q WHERE q.id IS NOT NULL AND $2 > 0 ORDER BY id, which; $$ LANGUAGE sql;
The function accepts a node's id
and a level on input, and returns a set of arrays, each corresponding to one of the node's children and its level. The level returned by the function decreases and in fact represents not the level as such, but the number of levels left to reach the filterset bottom. But since the initial level is userset, it is easy to cast it to the actual level.
Let's run the function:
SELECT c[1], 3  c[2] FROM fn_get_children(1, 2) c;
c  ?column?  

6  1  
34  2  
35  2  
33  2  
31  2  
32  2  
7  1  
38  2  
40  2  
39  2  
36  2  
37  2  
8  1  
41  2  
42  2  
43  2  
44  2  
45  2  
9  1  
47  2  
50  2  
46  2  
48  2  
49  2  
10  1  
53  2  
55  2  
51  2  
52  2  
54  2  
30 rows fetched in 0.0023s (0.0658s) 
Function Scan on fn_get_children c (cost=0.00..262.50 rows=1000 width=32)
As we can see, the function returned all children and grandchildren of the node 1 along with their level, and did it in only 65 ms.