PostgreSQL: nested sets and R-Tree at EXPLAIN EXTENDED

PostgreSQL: nested sets and R-Tree

Comments enabled. I *really* need your comment

A feedback on one of my previous articles comparing adjacency list and nested sets models for PostgreSQL.

Jay writes:

In your series on adjacency lists vs nested sets, you discuss geometric types and R-Tree indexes in MySQL, but you don't discuss them when discussing the same subject with PostgreSQL, which also has geometric types and R-Tree indexing (mostly available through GiST indexes).

To make it simple I added the following line after the data insertion part of the script at Nested Sets - Postgresql:
ALTER TABLE t_hierarchy ADD COLUMN sets POLYGON;
UPDATE t_hierarchy SET sets = POLYGON(BOX(POINT(-1,lft), POINT(1, rgt)));
It needed to be a POLYGON instead of a BOX since there is a @>(POLYGON,POLYGON) function but no @>(BOX,BOX) function, and the polygon was cast from the box to create the rectangle shape required.

It outperforms the adjacency list on all descendants; outperforms it on all ancestors (not by much); performs reasonably well on all descendants up to a certain level on items with few descendants (e. g. 31415) and badly on items with many descendants (e. g. 42).

It still completes in less than 20 seconds though, which is an improvement over 1 minute.

PostgreSQL does support R-Tree indexes indeed (through GiST interface), and they can be used to improve the efficiency of the nested sets model.

Let's create a sample table and try some of the queries that Jay proposed:

Table creation details

CREATE TABLE t_hierarchy (
id INT NOT NULL,
parent INT NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL,
data VARCHAR(100) NOT NULL,
stuffing VARCHAR(100) NOT NULL
);

INSERT
INTO    t_hierarchy
WITH RECURSIVE
ini AS
(
SELECT  8 AS level, 5 AS children
),
range AS
(
SELECT  level, children,
(
SELECT  SUM(POW(children, n)::INTEGER * ((n &lt; level)::INTEGER + 1))
                FROM    generate_series(level, 0, -1) n
                ) width
        FROM    ini
        ),
        q AS
        (
        SELECT  s AS id, 0 AS parent, level, children,
                1 + width * (s - 1) AS lft,
                1 + width * s - 1 AS rgt,
                width / children AS width
        FROM    (
                SELECT  r.*, generate_series(1, children) s
                FROM    range r
                ) q2
        UNION ALL
        SELECT  id * children + position, id, level - 1, children,
                1 + lft + width * (position - 1),
                1 + lft + width * position - 1,
                width / children
        FROM    (
                SELECT  generate_series(1, children) AS position, q.*
                FROM    q
                ) q2
        WHERE   level &gt; 0
)
SELECT  id, parent, lft, rgt, 'Value ' || id, RPAD('', 100, '*')
FROM    q;

ALTER TABLE t_hierarchy ADD CONSTRAINT pk_hierarchy_id PRIMARY KEY (id);
CREATE INDEX ix_hierarchy_lft ON t_hierarchy (lft);
CREATE INDEX ix_hierarchy_rgt ON t_hierarchy (rgt);
CREATE INDEX ix_hierarchy_parent ON t_hierarchy (parent);
CREATE INDEX ix_hierarchy_sets ON t_hierarchy USING GIST(POLYGON(BOX(POINT(-1, lft), POINT(1, rgt))));

To make the management of the table easier, I didn't create an additional column with the geometric representation of the nested sets, but instead just defined an index on a derived expression, so that updating lft and rgt columns would be enough to update the set.

Now, let's see how these queries perform.

All descendants

SELECT  SUM(LENGTH(hc.stuffing)), COUNT(*)
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      POLYGON(BOX(POINT(-1, hc.lft), POINT(1, hc.rgt))) &lt;@ POLYGON(BOX(POINT(-1, hp.lft), POINT(1, hp.rgt)))
WHERE   hp.id = 42

View query details

sum	count
1953100	19531
1 row fetched in 0.0003s (0.2139s)

Aggregate  (cost=8253.58..8253.60 rows=1 width=101)
  ->  Nested Loop  (cost=136.32..8241.37 rows=2441 width=101)
        ->  Index Scan using pk_hierarchy_id on t_hierarchy hp  (cost=0.00..8.54 rows=1 width=8)
              Index Cond: (id = 42)
        ->  Bitmap Heap Scan on t_hierarchy hc  (cost=136.32..8129.10 rows=2441 width=109)
              Recheck Cond: (polygon(box(point((-1)::double precision, (hc.lft)::double precision), point(1::double precision, (hc.rgt)::double precision))) <@ polygon(box(point((-1)::double precision, (hp.lft)::double precision), point(1::double precision, (hp.rgt)::double precision))))
              ->  Bitmap Index Scan on ix_hierarchy_sets  (cost=0.00..135.71 rows=2441 width=0)
                    Index Cond: (polygon(box(point((-1)::double precision, (hc.lft)::double precision), point(1::double precision, (hc.rgt)::double precision))) <@ polygon(box(point((-1)::double precision, (hp.lft)::double precision), point(1::double precision, (hp.rgt)::double precision))))

Quite fast, 213 ms.

All ancestors

SELECT  hc.id, hc.lft, hc.rgt, hc.parent
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      POLYGON(BOX(POINT(-1, hc.lft), POINT(1, hc.rgt))) @&gt; POLYGON(BOX(POINT(-1, hp.lft), POINT(1, hp.rgt)))
WHERE   hp.id = 42

View query details

id	lft	rgt	parent
1	1	585937	0
8	234376	351562	1
42	257814	281250	8
3 rows fetched in 0.0007s (0.0127s)

Nested Loop  (cost=136.32..8241.37 rows=2441 width=16)
  ->  Index Scan using pk_hierarchy_id on t_hierarchy hp  (cost=0.00..8.54 rows=1 width=8)
        Index Cond: (id = 42)
  ->  Bitmap Heap Scan on t_hierarchy hc  (cost=136.32..8129.10 rows=2441 width=16)
        Recheck Cond: (polygon(box(point((-1)::double precision, (hc.lft)::double precision), point(1::double precision, (hc.rgt)::double precision))) @> polygon(box(point((-1)::double precision, (hp.lft)::double precision), point(1::double precision, (hp.rgt)::double precision))))
        ->  Bitmap Index Scan on ix_hierarchy_sets  (cost=0.00..135.71 rows=2441 width=0)
              Index Cond: (polygon(box(point((-1)::double precision, (hc.lft)::double precision), point(1::double precision, (hc.rgt)::double precision))) @> polygon(box(point((-1)::double precision, (hp.lft)::double precision), point(1::double precision, (hp.rgt)::double precision))))

Extremely fast: only 10 ms.

All descendants up to a certain level

SELECT  hc.id, hc.lft, hc.rgt, hc.parent
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      POLYGON(BOX(POINT(-1, hc.lft), POINT(1, hc.rgt))) &lt;@ POLYGON(BOX(POINT(-1, hp.lft), POINT(1, hp.rgt)))
WHERE   hp.id = 42
        AND
        (
        SELECT  COUNT(*)
        FROM    t_hierarchy hcp
        WHERE   POLYGON(BOX(POINT(-1, hc.lft), POINT(1, hc.rgt))) &lt;@ POLYGON(BOX(POINT(-1, hcp.lft), POINT(1, hcp.rgt)))
        ) -
        (
        SELECT  COUNT(*)
        FROM    t_hierarchy hpp
        WHERE   POLYGON(BOX(POINT(-1, hp.lft), POINT(1, hp.rgt))) &lt;@ POLYGON(BOX(POINT(-1, hpp.lft), POINT(1, hpp.rgt)))
        ) &lt;= 2

View query details

id	lft	rgt	parent
212	262502	267188	42
1063	264377	265313	212
1059	260627	261563	211
211	257815	262501	42
1057	258753	259689	211
1066	267190	268126	213
1067	268127	269063	213
42	257814	281250	8
213	267189	271875	42
214	271876	276562	42
1068	269064	270000	213
1069	270001	270937	213
1070	270938	271874	213
1071	271877	272813	214
1072	272814	273750	214
1073	273751	274687	214
1065	266251	267187	212
1064	265314	266250	212
1062	263440	264376	212
1061	262503	263439	212
1060	261564	262500	211
1058	259690	260626	211
1056	257816	258752	211
215	276563	281249	42
1074	274688	275624	214
1075	275625	276561	214
1076	276564	277500	215
1077	277501	278437	215
1078	278438	279374	215
1079	279375	280311	215
1080	280312	281248	215
31 rows fetched in 0.0039s (20.2523s)

Nested Loop  (cost=0.03..40113216.41 rows=814 width=16)
  Join Filter: (((SubPlan 1) - (SubPlan 2)) <= 2)
  ->  Index Scan using pk_hierarchy_id on t_hierarchy hp  (cost=0.00..8.54 rows=1 width=8)
        Index Cond: (id = 42)
  ->  Index Scan using ix_hierarchy_sets on t_hierarchy hc  (cost=0.03..9692.12 rows=2441 width=16)
        Index Cond: (polygon(box(point((-1)::double precision, (hc.lft)::double precision), point(1::double precision, (hc.rgt)::double precision))) <@ polygon(box(point((-1)::double precision, (hp.lft)::double precision), point(1::double precision, (hp.rgt)::double precision))))
  SubPlan 1
    ->  Aggregate  (cost=8214.53..8214.54 rows=1 width=0)
          ->  Bitmap Heap Scan on t_hierarchy hcp  (cost=136.32..8208.43 rows=2441 width=0)
                Recheck Cond: (polygon(box(point((-1)::double precision, ($0)::double precision), point(1::double precision, ($1)::double precision))) <@ polygon(box(point((-1)::double precision, (lft)::double precision), point(1::double precision, (rgt)::double precision))))
                ->  Bitmap Index Scan on ix_hierarchy_sets  (cost=0.00..135.71 rows=2441 width=0)
                      Index Cond: (polygon(box(point((-1)::double precision, ($0)::double precision), point(1::double precision, ($1)::double precision))) <@ polygon(box(point((-1)::double precision, (lft)::double precision), point(1::double precision, (rgt)::double precision))))
  SubPlan 2
    ->  Aggregate  (cost=8214.53..8214.54 rows=1 width=0)
          ->  Bitmap Heap Scan on t_hierarchy hpp  (cost=136.32..8208.43 rows=2441 width=0)
                Recheck Cond: (polygon(box(point((-1)::double precision, ($2)::double precision), point(1::double precision, ($3)::double precision))) <@ polygon(box(point((-1)::double precision, (lft)::double precision), point(1::double precision, (rgt)::double precision))))
                ->  Bitmap Index Scan on ix_hierarchy_sets  (cost=0.00..135.71 rows=2441 width=0)
                      Index Cond: (polygon(box(point((-1)::double precision, ($2)::double precision), point(1::double precision, ($3)::double precision))) <@ polygon(box(point((-1)::double precision, (lft)::double precision), point(1::double precision, (rgt)::double precision))))

This, exactly as was mentioned by Jay, is much faster than using a B-Tree index but still too slow: 20 seconds.

Analysis

The nested sets model, improved by using the R-Tree indexes, provides a way to tell if two records are in the same ancestry chain.

However, even with the R-Tree, the model provides no simple means to tell how deep is a record nested.

To check it, an R-Tree index scan should be made which would return all of the record's ancestors, the the number of the ancestors is to be compared with that of the parent node.

For a record with lots of ancestors (which was the case for the record 42 we used in the test queries), this means that thousands of records should be checked in a nested loop, out of which only a dozen will be returned.

Ironically, for the real-world models, this type of query is most often used, and used against the records with lots of descendants it is.

Usually, when hierarchical data are stored in a database, they are presented to a user in the form of a tree view. When the user opens the catalog, the first-level entries are show; when the user clicks on expand button of an entry, all immediate children of the entry should be shown.

Since users usually start browsing from the beginning, clicking the expand buttons on the first-level or second-level entries is what happens most often. And, unfortunately, it takes the most time to execute these queries.

Adjacency list model provides a constant time solution to this problem, since fetching all immediate children requires a single index scan. This is extremely fast on showing the immediate children.

A user can also click on expand all which should just return all children of the given entry.

However, clicking on expand all on a high-level entry will return too many records, so a time to download them or represent them in the GUI will be much more than that required to fetch them out of the table. A properly written GUI usually limits the level of the records returned so that GUI remains responsive, which, it its turn, implies the same problem of filtering on level.

The low-level entries (for which it makes sense to implement expand all without any limitations) can be queried for their descendants with the R-Tree query in the nested sets model or with a recursive query in the adjacency list model almost equally fast, since low-level entries contain few records.

The same applies to selecting all ancestors. Despite the fact that the nested sets model outperforms slightly the adjacency list model on this type of query, the absolute numbers are very small and the times that both queries take are almost imperceptible to the bare eye. A hierarchy is seldom more than a dozen levels deep, and fetching each ancestor even with a recursive query requires but one unique index scan per ancestor.

However, one may still be forced to use the nested tree model. This may be the way an ORM stores its data in the database; a heavily used legacy schema too old and scary to touch; or just some obscure model which mostly requires fetching all descendants fast with an occasional need to filter on the level.

Here are some methods to deal with it.

Analytic functions

Though there is no efficient way to filter all descendants on the level, there still is a way to fetch all immediate children of a record.

If we select all records within lft and rgt of a given entry and order them by lft, the first record will be the first immediate child of the entry.

All descendants of the first child will be returned before the second child and have rgt less than that of the first child.

This means that if we record the MAX(rgt) fetched so far, it will be that of the last immediate child of the entry fetched so far:

id	lft	rgt	MAX(rgt)
2	2	11	11
3	3	4	11
4	5	8	11
5	6	7	11
6	9	10	11
7	12	15	15
8	13	14	15

This means that each value of MAX(rgt) will correspond to exactly one immediate child; and the first entry in the recordset holding the value of MAX(rgt) will be that first child.

Several method exist to select records holding group-wise maximum in PostgreSQL. In this case, is will be best to use PostgreSQL's DISTINCT ON.

Here's the query:

SELECT  DISTINCT ON (MAX(hc.rgt) OVER (ORDER BY hc.lft)) hc.id, hc.lft, hc.rgt
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft &gt; hp.lft
AND hc.lft &lt; hp.rgt
WHERE   hp.id = 42
ORDER BY
        MAX(hc.rgt) OVER (ORDER BY hc.lft), hc.lft

View query details

id	lft	rgt
211	257815	262501
212	262502	267188
213	267189	271875
214	271876	276562
215	276563	281249
5 rows fetched in 0.0008s (0.1642s)

Unique  (cost=116073.33..117429.66 rows=271267 width=12)
  ->  Sort  (cost=116073.33..116751.50 rows=271267 width=12)
        Sort Key: (max(hc.rgt) OVER (?)), hc.lft
        ->  WindowAgg  (cost=86845.19..91592.36 rows=271267 width=12)
              ->  Sort  (cost=86845.19..87523.35 rows=271267 width=12)
                    Sort Key: hc.lft
                    ->  Nested Loop  (cost=5761.00..62364.22 rows=271267 width=12)
                          ->  Index Scan using pk_hierarchy_id on t_hierarchy hp  (cost=0.00..8.54 rows=1 width=8)
                                Index Cond: (id = 42)
                          ->  Bitmap Heap Scan on t_hierarchy hc  (cost=5761.00..58286.67 rows=271267 width=12)
                                Recheck Cond: ((hc.lft > hp.lft) AND (hc.lft < hp.rgt))
                                ->  Bitmap Index Scan on ix_hierarchy_lft  (cost=0.00..5693.19 rows=271267 width=0)
                                      Index Cond: ((hc.lft > hp.lft) AND (hc.lft < hp.rgt))

, which is reasonably fast (only 160 ms).

Using PostgreSQL 8.4 recursive abilities, this approach can be extended to select the descendants up to any level (provided as a parameter to the query).

Here's the query to select all children and grandchildren:

WITH    RECURSIVE
q AS
(
SELECT  id, lft, rgt, 1 AS lvl
FROM    t_hierarchy
WHERE   id = 42
UNION ALL
SELECT  *
FROM    (
SELECT  DISTINCT ON (MAX(hc.rgt) OVER (PARTITION BY q.id ORDER BY hc.lft)) hc.id, hc.lft, hc.rgt, lvl + 1
FROM    q
JOIN    t_hierarchy hc
ON      hc.lft &gt; q.lft
AND hc.lft &lt; q.rgt
                WHERE   lvl &lt;= 2
                ORDER BY
                        MAX(hc.rgt) OVER (PARTITION BY q.id ORDER BY hc.lft), hc.lft
                ) q2
        )
SELECT  *
FROM    q

View query details

id	lft	rgt	lvl
42	257814	281250	1
211	257815	262501	2
212	262502	267188	2
213	267189	271875	2
214	271876	276562	2
215	276563	281249	2
1056	257816	258752	3
1057	258753	259689	3
1058	259690	260626	3
1059	260627	261563	3
1060	261564	262500	3
1061	262503	263439	3
1062	263440	264376	3
1063	264377	265313	3
1064	265314	266250	3
1065	266251	267187	3
1066	267190	268126	3
1067	268127	269063	3
1068	269064	270000	3
1069	270001	270937	3
1070	270938	271874	3
1071	271877	272813	3
1072	272814	273750	3
1073	273751	274687	3
1074	274688	275624	3
1075	275625	276561	3
1076	276564	277500	3
1077	277501	278437	3
1078	278438	279374	3
1079	279375	280311	3
1080	280312	281248	3
31 rows fetched in 0.0042s (0.4342s)

CTE Scan on q  (cost=3923702.09..4086462.51 rows=8138021 width=16)
  CTE q
    ->  Recursive Union  (cost=0.00..3923702.09 rows=8138021 width=16)
          ->  Index Scan using pk_hierarchy_id on t_hierarchy  (cost=0.00..8.54 rows=1 width=12)
                Index Cond: (id = 42)
          ->  Subquery Scan q2  (cost=363886.28..376093.31 rows=813802 width=16)
                ->  Unique  (cost=363886.28..367955.29 rows=813802 width=20)
                      ->  Sort  (cost=363886.28..365920.79 rows=813802 width=20)
                            Sort Key: (max(hc.rgt) OVER (?)), hc.lft
                            ->  WindowAgg  (cost=265683.50..283994.05 rows=813802 width=20)
                                  ->  Sort  (cost=265683.50..267718.01 rows=813802 width=20)
                                        Sort Key: q.id, hc.lft
                                        ->  Nested Loop  (cost=5335.67..185791.26 rows=813802 width=20)
                                              ->  WorkTable Scan on q  (cost=0.00..0.22 rows=3 width=16)
                                                    Filter: (lvl <= 2)
                                              ->  Bitmap Heap Scan on t_hierarchy hc  (cost=5335.67..57861.34 rows=271267 width=12)
                                                    Recheck Cond: ((hc.lft > q.lft) AND (hc.lft < q.rgt))
                                                    ->  Bitmap Index Scan on ix_hierarchy_lft  (cost=0.00..5267.85 rows=271267 width=0)
                                                          Index Cond: ((hc.lft > q.lft) AND (hc.lft < q.rgt))

This is also reasonably fast, only 432 ms. It is slower than the same adjacency list query (which completes in several milliseconds), but still is much faster than R-Tree and of course the least efficient B-Tree solutions involving COUNT(*) and can ease your life if you have to deal with a nested sets model.

To be continued.

Written by Quassnoi

March 1st, 2010 at 11:00 pm

Posted in PostgreSQL

EXPLAIN EXTENDED