EXPLAIN EXTENDED

How to create fast database queries

Archive for September 28th, 2009

Adjacency list vs. nested sets: Oracle

Comments enabled. I *really* need your comment

Continuing the series:

What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model?

For detailed explanations of the terms, see the first article in the series:

Today is Oracle time.

Adjacent sets require recursion, and Oracle has a native way to implement recursion in SQL statements. To do this it uses a special clause, CONNECT BY.

Unlike other systems, Oracle always supported recursion in SQL statements since version 2 (this is the name the first version of Oracle was released under).

This was done because by that time there already were several hierarchical databases on the market which are quite good for storing hierarchical data. If not for this clause, transition from a hierarchical DBMS to the relational DBMS would be too hard.

Oracle's way to query self-relations is somewhat different from recursive CTE's that other systems use:

  • Recursive CTEs can use an arbitrary set in recursive steps. The recursion base and recursion stack are not visible in further steps. The recursive operation is a set operation (usually a JOIN) on the recursion parameter (a set) and the result of the previous operation:

    Recursive CTE

  • CONNECT BY queries can only use the value of a single row in recursive steps.

    The recursion base and recursion stack are visible, the values outside the recursion stack are not.

    The recursive operation is a set operation on each row of the current row and the base rowset.

    Unlike recursive CTE, CONNECT BY operates row-wise, that is each row of the result spans its own recursion branch with its own stack:

    CONNECT BY

The main difference it that the CONNECT BY operation cannot produce anything that was not in the original rowset, while a recursive CTE can.

However, CONNECT BY has numerous benefits too: it's very fast, allows to track the hierarchy path easily (using a built-in funtion named SYS_CONNECT_BY_PATH) and implicitly sorts the recordset in tree order (with additional clause ORDER SIBLINGS BY to sort the siblings).

This method, of course, allows traversing adjacency lists easily.

To compare efficiency of adjacency lists and that of nested sets, let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

September 28th, 2009 at 11:00 pm

Posted in Oracle