EXPLAIN EXTENDED

How to create fast database queries

Hierarchical queries in MySQL: finding loops

with 4 comments

Today, we will check our structure for loops.

Loops in hierarchical queries occur when a row contains itself in its ancestry chain.

In general, it's a sign of a database logic flaw and should be avioded, but sometimes it's deliberately used.

The simplest and most easily detectable case is having id equal to parent (meaning that the row is both its father and child). This may be tested by using a simple comparison condition.

But a loop of course can be more complex: a row can be its own grandfather, grand-grandfather etc.

To detect such loops, we will improve our functions a little.

We'll add a loop to our data:

SELECT  *
FROM    t_hierarchy
WHERE   id >= 97657
id parent
97657 97658
97658 97659
97659 97657
97660 97657
3 rows fetched in 0.0001s (0.0016s)

Our old function will be fooled by this loop and it will return the values over and over again each time it will be called.

To avoid this, we will rewrite our functions a little.

First, we need CONNECT BY to break when a loop is detected. On each iteration, we test if the current @id is equal to @start_with. If yes, then we have a loop in our data and it's just the right time to stop iterating.

It's quite straightforward:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id_with_level_and_loop(value INT, maxlevel INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _next INT;
DECLARE _i INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

SET _parent = @id;
SET _id = -1;
SET _i = 0;

IF @id IS NULL THEN
RETURN NULL;
END IF;

LOOP
SELECT  MIN(id)
INTO    @id
FROM    t_hierarchy
WHERE   parent = _parent
AND id > _id
-- Checking for @start_with in descendants
AND id <> @start_with
AND COALESCE(@level < maxlevel, TRUE);
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
                SET _i = _i + 1;
        END LOOP;
        RETURN NULL;
END

Second, we need to check if a current row is inside a hierarchy loop. The check should return true if the row itself is among its descendants. We probably could iterate all descendants and see if we encounter the row's id.

But a row can have more than one descendant, so this algorithm would be quite inefficient.

We can exploit the fact that if a row is among its descendants, then it should also be among its ancestors too.

Ancestors, unlike descendants, are unique for any given generation (i. e. a row can have exactly one father, exactly one grandfather etc). That means that if we encounter any ancestor more than once then we have a loop somewhere in the ancestry chain.

But having a loop in the ancestry chain does not necessarily mean being inside the loop. Rows, as we can remember, can have more than one child. If a row has two children, it may have itself somewhere among the first child's descendants, but not among the second child's ones.

In the sample data above, rows 97658 and 97660 both have 97557 as a parent, but the first one is inside the cycle (it has its parent among its descendants) while the second one is not.

To check for the cycle, we will do the following:

  1. We iterate up the ancestry chain until we find an orphan row or encounter @start_with twice.
  2. Each time we find the current row in it's own ancestry chain, we increment a counter.
  3. Having an orphan row in the ancestry chain mean not having a cycle, in this case we just return 0.
  4. Encountering @start_with twice means completing a cycle. If the current row is contained in the cycle, then we should have encountered it exactly one time. If it is not in the cycle, we shouldn't have encountered it at all. That's why we just return the counter value as a boolean result.

The function will look like this:

CREATE FUNCTION hierarchy_connect_by_iscycle(node INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _loop INT;
DECLARE _node INT;
DECLARE EXIT HANDLER FOR NOT FOUND RETURN 0;
SET _id = COALESCE(node, @id);
SET _loop = 0;
SET _node = 0;
LOOP
SELECT  parent
INTO    _id
FROM    t_hierarchy
WHERE   id = _id;
IF _id = @start_with THEN
SET _loop := _loop + 1;
END IF;
IF _id = COALESCE(node, @id) THEN
SET _node = _node + 1;
END IF;
IF _loop >= 2 THEN
RETURN _node;
END IF;
END LOOP;
END

Finally, we change the way we check the rows for being leaves, namely, we not just select the first child of a row, but we select the first child not equal to @start_with. If @start_with is the only child of a row, the rows is counted as a leaf, as @start_with will never be selected as a child.

Combining all said above, we come out with the following query:

SELECT  CONCAT(REPEAT('    ', lvl - 1), hi.id) AS treeitem,
hierarchy_sys_connect_by_path('/', hi.id) AS path,
parent, lvl,
CASE
WHEN lvl >= @maxlevel THEN 1
ELSE COALESCE(
(
SELECT  0
FROM    t_hierarchy hl
WHERE   hl.parent = ho.id
AND hl.id <> @start_with
LIMIT 1
), 1)
END AS is_leaf,
hierarchy_connect_by_iscycle(hi.id) AS is_cycle
FROM    (
SELECT  hierarchy_connect_by_parent_eq_prior_id_with_level_and_loop(id, @maxlevel) AS id,
CAST(@level AS SIGNED) AS lvl
FROM    (
SELECT  @start_with := 97657,
@id := @start_with,
@level := 0,
@maxlevel := NULL
) vars, t_hierarchy
WHERE   @id IS NOT NULL
) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
treeitem path parent lvl is_leaf is_cycle
97657 97658/97657 97658 1 0 1
    97659 97658/97657/97659 97657 2 1 1
    97660 97658/97657/97660 97657 2 1 0
3 rows fetched in 0.0003s (0.0590s)

Note row 97659 appearing as a leaf despite being parent for 97658. That's because we use 97658 in a @start_with condition.

Written by Quassnoi

March 21st, 2009 at 11:00 pm

Posted in MySQL

4 Responses to 'Hierarchical queries in MySQL: finding loops'

Subscribe to comments with RSS

  1. I know this is an old thread, just wanted to thank you for this series, they are still awesome!

    Niklas B

    2 Feb 13 at 17:32

  2. This is an old post but still usefull !
    thanks a lot for the serie !

    crozet

    3 Aug 15 at 17:45

  3. Thank you for this great article. It really helps me to save some time. I am going to learn your another articles also.

    Denis I

    6 Sep 17 at 15:17

  4. I wonder if there is a way to manage a multiple root node forest, when “start with” clause extract a set of records and not a single one.
    In Oracle, for example, we can specify a “start with” clause like “start with parent is null” to find all the root nodes of a forest.

    falco

    27 Feb 18 at 12:35

Leave a Reply