EXPLAIN EXTENDED

How to create fast database queries

Hierarchical queries in MySQL: adding level

with 4 comments

Today I will add some more features to these queries.

We have almost 100,000 rows in the table, 8 hierarchy levels, and our query is written so that it always selects all rows.

Now let's try to present our data in a Yahoo-style catalog. We need only top 2 levels: parent categories and their subcategories.

Our function fills the session variable called @level which we can use to filter out level three and below:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
FROM    (
SELECT  @start_with := 0,
@id := @start_with,
@level := 0
) vars, t_hierarchy
WHERE   @id IS NOT NULL
) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
WHERE   level <= 2
treeitem parent level
2 1 1
    7 2 2
    8 2 2
    9 2 2
    10 2 2
    11 2 2
3 1 1
    12 3 2
    13 3 2
    14 3 2
    15 3 2
    16 3 2
4 1 1
    17 4 2
    18 4 2
    19 4 2
    20 4 2
    21 4 2
5 1 1
    22 5 2
    23 5 2
    24 5 2
    25 5 2
    26 5 2
6 1 1
    27 6 2
    28 6 2
    29 6 2
    30 6 2
    31 6 2
30 rows fetched in 0,0031s (14,1724s)

The tree is now limited to the first two levels and we can format the tree for using on a catalog page.

This query, though, is not very efficient, as we there are almost 100,000 function calls which are quite expensive in MySQL.

We need to change the function so that it stops traversing the tree as soon as it reaches the certain depth. When it reaches the end of the tree, @id will become NULL and the function will not be called anymore.

We will create a new function that accepts the tree depth as a parameter:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id_with_level(value INT, maxlevel INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _next INT;
DECLARE _i INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

SET _parent = @id;
SET _id = -1;
SET _i = 0;

IF @id IS NULL THEN
RETURN NULL;
END IF;

LOOP
SELECT  MIN(id)
INTO    @id
FROM    t_hierarchy
WHERE   parent = _parent
AND id > _id
AND COALESCE(@level < maxlevel, TRUE);
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
                SET _i = _i + 1;
        END LOOP;
        RETURN NULL;
END

The new function, being used inside the same query, yields the following results:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id_with_level(id, 2) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 1,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
treeitem parent level
2 1 1
    7 2 2
    8 2 2
    9 2 2
    10 2 2
    11 2 2
3 1 1
    12 3 2
    13 3 2
    14 3 2
    15 3 2
    16 3 2
4 1 1
    17 4 2
    18 4 2
    19 4 2
    20 4 2
    21 4 2
5 1 1
    22 5 2
    23 5 2
    24 5 2
    25 5 2
    26 5 2
6 1 1
    27 6 2
    28 6 2
    29 6 2
    30 6 2
    31 6 2
30 rows fetched in 0,0050s (0,0539s)

There are only 30 functions calls this time instead of 100,000 and the query performs much better.

If maxlevel is NULL, the new function behaves exactly as the previous one.

In an efficient programming language such as C++ it would be reasonable to re-implement the old function as a wrapper for the new one with maxlevel set to NULL.

But in MySQL, unfortunately, the function call itself consumes lots of resources. Such a solution, though looking elegant, would severely impact performance.

That's why we will just keep two versions of the functions.

To be continued.

Written by Quassnoi

March 18th, 2009 at 11:00 pm

Posted in MySQL

4 Responses to 'Hierarchical queries in MySQL: adding level'

Subscribe to comments with RSS

  1. How about working with trees which allow multiple parents, e.g. like a family tree, as well as multiple children?

    I have a data set where there are potentially dozens of parents, and where the depth can be >30. It doesn’t help performance that the table has 2.3M rows…

    Kevin

    26 Oct 12 at 03:48

  2. @Kevin: what kind of an answer do you expect to “how about”?

    Please describe your model and the query you want to run.

    Quassnoi

    26 Oct 12 at 12:01

  3. Hi,
    in the 3rd request of this page, I had to change
    SELECT @start_with := 1 SELECT @start_with := 0

    To make it works

    GISSINGER

    10 Oct 13 at 20:51

  4. These queries are great! Thank you for publishing this series on hierarchical queries in mysql.

    plong0

    23 Mar 16 at 22:47

Leave a Reply