Adjacency list vs. nested sets: Oracle at EXPLAIN EXTENDED

Adjacency list vs. nested sets: Oracle

Comments enabled. I *really* need your comment

Continuing the series:

What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model?

For detailed explanations of the terms, see the first article in the series:

Adjacency list vs. nested sets: PostgreSQL

Today is Oracle time.

Adjacent sets require recursion, and Oracle has a native way to implement recursion in SQL statements. To do this it uses a special clause, CONNECT BY.

Unlike other systems, Oracle always supported recursion in SQL statements since version 2 (this is the name the first version of Oracle was released under).

This was done because by that time there already were several hierarchical databases on the market which are quite good for storing hierarchical data. If not for this clause, transition from a hierarchical DBMS to the relational DBMS would be too hard.

Oracle's way to query self-relations is somewhat different from recursive CTE's that other systems use:

Recursive CTEs can use an arbitrary set in recursive steps. The recursion base and recursion stack are not visible in further steps. The recursive operation is a set operation (usually a JOIN) on the recursion parameter (a set) and the result of the previous operation:
CONNECT BY queries can only use the value of a single row in recursive steps.
The recursion base and recursion stack are visible, the values outside the recursion stack are not.

The recursive operation is a set operation on each row of the current row and the base rowset.

Unlike recursive CTE, CONNECT BY operates row-wise, that is each row of the result spans its own recursion branch with its own stack:

The main difference it that the CONNECT BY operation cannot produce anything that was not in the original rowset, while a recursive CTE can.

However, CONNECT BY has numerous benefits too: it's very fast, allows to track the hierarchy path easily (using a built-in funtion named SYS_CONNECT_BY_PATH) and implicitly sorts the recordset in tree order (with additional clause ORDER SIBLINGS BY to sort the siblings).

This method, of course, allows traversing adjacency lists easily.

To compare efficiency of adjacency lists and that of nested sets, let's create a sample table:

Table creation details

CREATE TABLE t_hierarchy (id NOT NULL, parent NOT NULL, lft NOT NULL, rgt NOT NULL, data NOT NULL, stuffing NOT NULL)
AS
WITH    nums AS
(
SELECT  level AS num, 100 AS value
FROM    dual
CONNECT BY
level &lt;= 5
        ),
        digits AS
        (
        SELECT  SYS_CONNECT_BY_PATH(num, '/') AS digit, level AS len
        FROM    nums
        CONNECT BY
                level &lt;= 9
        )
SELECT  id,
        FLOOR((id - 1) / 5),
        lft,
        lft + FLOOR(2929687 / POWER(5, len)) - 1,
        CAST('Value ' || id AS VARCHAR2(100)),
        CAST(RPAD('*', 100, '*') AS VARCHAR2(100))
FROM    (
        SELECT  digit, len,
                (
                SELECT  SUM(SUBSTR(digit, level * 2, 1) * POWER(5, len - level))
                FROM    dual
                CONNECT BY
                        level &lt;= len
                ) AS id,
                (
                SELECT  SUM((SUBSTR(digit, level * 2, 1) - 1) * FLOOR(2929687 / POWER(5, level)) + 1)
                FROM    dual
                CONNECT BY
                        level &lt;= len
                ) AS lft
        FROM    digits
        )
/
ALTER TABLE t_hierarchy ADD CONSTRAINT pk_hierarchy_id PRIMARY KEY (id)
/
CREATE INDEX ix_hierarchy_parent ON t_hierarchy (parent)
/
CREATE INDEX ix_hierarchy_lft ON t_hierarchy (lft)
/
CREATE INDEX ix_hierarchy_rgt ON t_hierarchy (rgt)
/

The table contains both adjacency list data and nested sets data, with 8 levels of nesting, 5 children of each parent node and 2,441,405 records.

As in the previous articles, we'll test performance of the three most used queries:

Find all descendants of a given node
Find all ancestors of a given node
Find all descendants of a given node up to a certain depth

All descendants

Nested sets

SELECT  SUM(LENGTH(hc.stuffing))
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 42

View query details

SUM(LENGTH(HC.STUFFING))
1953100
1 row fetched in 0.0001s (0.0144s)

SELECT STATEMENT 
 SORT AGGREGATE
  NESTED LOOPS 
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT

This is super fast, faster than in any system before that: only 10 ms. The plan is quite predictable: a UNIQUE scan to find the parent record and a range scan to find all children.

Adjacency list

SELECT  SUM(LENGTH(stuffing))
FROM    t_hierarchy
START WITH
id = 42
CONNECT BY
parent = PRIOR id

View query details

SUM(LENGTH(STUFFING))
1953100
1 row fetched in 0.0001s (0.1479s)

SELECT STATEMENT 
 SORT AGGREGATE
  CONNECT BY WITH FILTERING
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   NESTED LOOPS 
    BUFFER SORT
     CONNECT BY PUMP 
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
   TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY

This is quite fast too: 140 ms. The plan loops over the parent ranges for each id.

Note the TABLE ACCESS FULL in the end of the plan. Oracle reserves a right to do a FULL SCAN instead of a RANGE SCAN if it considers a certain parent range too large to use the index. However, this is recursion, not a single operation, and it's impossible to tell in advance which id values will be returned on each recursion step. That's why Oracle shows both methods, just in case, and chooses the most efficient method in runtime.

In our table, at most 5 rows share a parent (of more then 2 million), so it's very unlikely this plan branch will be ever chosen.

Find all ancestors

Nested sets

SELECT  hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM    t_hierarchy hc
JOIN    t_hierarchy hp
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hc.id = 1000000
ORDER BY
hp.lft

View query details

ID	PARENT	LFT	RGT	DATA
2	0	585938	1171874	Value 2
12	2	703126	820312	Value 12
63	12	750001	773437	Value 63
319	63	764063	768749	Value 319
1599	319	766875	767811	Value 1599
7999	1599	767437	767623	Value 7999
39999	7999	767549	767585	Value 39999
199999	39999	767571	767577	Value 199999
1000000	199999	767576	767576	Value 1000000
9 rows fetched in 0.0005s (3.2186s)

SELECT STATEMENT 
 NESTED LOOPS 
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT

The plan used is exactly the same as for the query to choose all descendants.

However, the range on lft is too broad and the query works for more than 3 seconds.

Adjacency list

SELECT  id, parent, lft, rgt, data, level
FROM    t_hierarchy
START WITH
id = 1000000
CONNECT BY
id = PRIOR parent
ORDER BY
level DESC

View query details

ID	PARENT	LFT	RGT	DATA	LEVEL
2	0	585938	1171874	Value 2	9
12	2	703126	820312	Value 12	8
63	12	750001	773437	Value 63	7
319	63	764063	768749	Value 319	6
1599	319	766875	767811	Value 1599	5
7999	1599	767437	767623	Value 7999	4
39999	7999	767549	767585	Value 39999	3
199999	39999	767571	767577	Value 199999	2
1000000	199999	767576	767576	Value 1000000	1
9 rows fetched in 0.0005s (0.0009s)

SELECT STATEMENT 
 SORT ORDER BY
  CONNECT BY WITH FILTERING
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   NESTED LOOPS 
    BUFFER SORT
     CONNECT BY PUMP 
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY

This query completes in less than 1 ms, which is instant.

Note that unlike recursive CTE's, Oracle supplies a built-in preudocolumn, level, that returns the current recursion level. This comes handy in cases like this.

Descendants up to a given level

Nested sets

SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = ?
AND
(
SELECT  COUNT(*)
FROM    t_hierarchy hn
WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) &lt;= 3

View the query for node 42

SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 42
AND
(
SELECT  COUNT(*)
FROM    t_hierarchy hn
WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) &lt;= 3

ID	PARENT	LFT	RGT	DATA
42	8	257814	281250	Value 42
211	42	257815	262501	Value 211
1056	211	257816	258752	Value 1056
1057	211	258753	259689	Value 1057
1058	211	259690	260626	Value 1058
1059	211	260627	261563	Value 1059
1060	211	261564	262500	Value 1060
212	42	262502	267188	Value 212
1061	212	262503	263439	Value 1061
1062	212	263440	264376	Value 1062
1063	212	264377	265313	Value 1063
1064	212	265314	266250	Value 1064
1065	212	266251	267187	Value 1065
213	42	267189	271875	Value 213
1066	213	267190	268126	Value 1066
1067	213	268127	269063	Value 1067
1068	213	269064	270000	Value 1068
1069	213	270001	270937	Value 1069
1070	213	270938	271874	Value 1070
214	42	271876	276562	Value 214
1071	214	271877	272813	Value 1071
1072	214	272814	273750	Value 1072
1073	214	273751	274687	Value 1073
1074	214	274688	275624	Value 1074
1075	214	275625	276561	Value 1075
215	42	276563	281249	Value 215
1076	215	276564	277500	Value 1076
1077	215	277501	278437	Value 1077
1078	215	278438	279374	Value 1078
1079	215	279375	280311	Value 1079
1080	215	280312	281248	Value 1080
31 rows fetched in 0.0000s (112.5589s)

SELECT STATEMENT 
 NESTED LOOPS 
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
    SORT AGGREGATE
     FILTER 
      TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
       INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT

View the query for node 31,415

SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 31415
AND
(
SELECT  COUNT(*)
FROM    t_hierarchy hn
WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) &lt;= 3

ID	PARENT	LFT	RGT	DATA
31415	6282	445651	445687	Value 31415
157076	31415	445652	445658	Value 157076
785381	157076	445653	445653	Value 785381
785382	157076	445654	445654	Value 785382
785383	157076	445655	445655	Value 785383
785384	157076	445656	445656	Value 785384
785385	157076	445657	445657	Value 785385
157077	31415	445659	445665	Value 157077
785386	157077	445660	445660	Value 785386
785387	157077	445661	445661	Value 785387
785388	157077	445662	445662	Value 785388
785389	157077	445663	445663	Value 785389
785390	157077	445664	445664	Value 785390
157078	31415	445666	445672	Value 157078
785391	157078	445667	445667	Value 785391
785392	157078	445668	445668	Value 785392
785393	157078	445669	445669	Value 785393
785394	157078	445670	445670	Value 785394
785395	157078	445671	445671	Value 785395
157079	31415	445673	445679	Value 157079
785396	157079	445674	445674	Value 785396
785397	157079	445675	445675	Value 785397
785398	157079	445676	445676	Value 785398
785399	157079	445677	445677	Value 785399
785400	157079	445678	445678	Value 785400
157080	31415	445680	445686	Value 157080
785401	157080	445681	445681	Value 785401
785402	157080	445682	445682	Value 785402
785403	157080	445683	445683	Value 785403
785404	157080	445684	445684	Value 785404
785405	157080	445685	445685	Value 785405
31 rows fetched in 0.0016s (0.0015s)

SELECT STATEMENT 
 NESTED LOOPS 
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
    SORT AGGREGATE
     FILTER 
      TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
       INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT

As in other systems, performance heavily depends on how many descendants does the record have.

For the node 42 which is close to the root node and has lots of descendants, the query runs for 122 seconds. For the node 31,415, the query takes 1.5 ms (next to instant).

Adjacency list

SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
id = ?
CONNECT BY
parent = PRIOR id
AND level &lt;= 3

Limiting the recursion depth query is very simple in Oracle: we just add the additional condition to CONNECT BY clause constraining the limit.

View the query for node 42

SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
id = 42
CONNECT BY
parent = PRIOR id
AND level &lt;= 3

ID	PARENT	LFT	RGT	DATA
42	8	257814	281250	Value 42
211	42	257815	262501	Value 211
1056	211	257816	258752	Value 1056
1057	211	258753	259689	Value 1057
1058	211	259690	260626	Value 1058
1059	211	260627	261563	Value 1059
1060	211	261564	262500	Value 1060
212	42	262502	267188	Value 212
1061	212	262503	263439	Value 1061
1062	212	263440	264376	Value 1062
1063	212	264377	265313	Value 1063
1064	212	265314	266250	Value 1064
1065	212	266251	267187	Value 1065
213	42	267189	271875	Value 213
1066	213	267190	268126	Value 1066
1067	213	268127	269063	Value 1067
1068	213	269064	270000	Value 1068
1069	213	270001	270937	Value 1069
1070	213	270938	271874	Value 1070
214	42	271876	276562	Value 214
1071	214	271877	272813	Value 1071
1072	214	272814	273750	Value 1072
1073	214	273751	274687	Value 1073
1074	214	274688	275624	Value 1074
1075	214	275625	276561	Value 1075
215	42	276563	281249	Value 215
1076	215	276564	277500	Value 1076
1077	215	277501	278437	Value 1077
1078	215	278438	279374	Value 1078
1079	215	279375	280311	Value 1079
1080	215	280312	281248	Value 1080
31 rows fetched in 0.0015s (0.0011s)

SELECT STATEMENT 
 CONNECT BY WITH FILTERING
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  NESTED LOOPS 
   BUFFER SORT
    CONNECT BY PUMP 
   FILTER 
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
  TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY

View the query for node 31,415

SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
id = 31415
CONNECT BY
parent = PRIOR id
AND level &lt;= 3

ID	PARENT	LFT	RGT	DATA
31415	6282	445651	445687	Value 31415
157076	31415	445652	445658	Value 157076
785381	157076	445653	445653	Value 785381
785382	157076	445654	445654	Value 785382
785383	157076	445655	445655	Value 785383
785384	157076	445656	445656	Value 785384
785385	157076	445657	445657	Value 785385
157077	31415	445659	445665	Value 157077
785386	157077	445660	445660	Value 785386
785387	157077	445661	445661	Value 785387
785388	157077	445662	445662	Value 785388
785389	157077	445663	445663	Value 785389
785390	157077	445664	445664	Value 785390
157078	31415	445666	445672	Value 157078
785391	157078	445667	445667	Value 785391
785392	157078	445668	445668	Value 785392
785393	157078	445669	445669	Value 785393
785394	157078	445670	445670	Value 785394
785395	157078	445671	445671	Value 785395
157079	31415	445673	445679	Value 157079
785396	157079	445674	445674	Value 785396
785397	157079	445675	445675	Value 785397
785398	157079	445676	445676	Value 785398
785399	157079	445677	445677	Value 785399
785400	157079	445678	445678	Value 785400
157080	31415	445680	445686	Value 157080
785401	157080	445681	445681	Value 785401
785402	157080	445682	445682	Value 785402
785403	157080	445683	445683	Value 785403
785404	157080	445684	445684	Value 785404
785405	157080	445685	445685	Value 785405
31 rows fetched in 0.0015s (0.0011s)

SELECT STATEMENT 
 CONNECT BY WITH FILTERING
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  NESTED LOOPS 
   BUFFER SORT
    CONNECT BY PUMP 
   FILTER 
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
  TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY

Both queries complete in 1 ms, that is instant.

Summary

Oracle implements a special clause, CONNECT BY, that makes traversing adjacency lists in general and parent-child relationships in particular very easy, since this is exactly what this clause was intended for.

Except for finding all descendants of a single node (without finding out or limiting the depth), nested sets model is less efficient in Oracle than adjacency lists.

Adjacency lists are not only more efficient but the queries using them are more elegant and easy to maintain due to the native support.

That's why adjacency list model should be used instead of nested sets model in Oracle too, as well as in SQL Server and PostgreSQL 8.4.

Written by Quassnoi

September 28th, 2009 at 11:00 pm

Posted in Oracle

EXPLAIN EXTENDED