EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Shared Plan and Algorithm Network Cache (SPANC). You're welcome to read and comment on it.

Hierarchical queries in MySQL: adding level

Today I will add some more features to these queries.

We have almost 100,000 rows in the table, 8 hierarchy levels, and our query is written so that it always selects all rows.

Now let’s try to present our data in a Yahoo-style catalog. We need only top 2 levels: parent categories and their subcategories.

Our function fills the session variable called @level which we can use to filter out level three and below:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 0,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
WHERE   level <= 2
treeitem parent level
2 1 1
    7 2 2
    8 2 2
    9 2 2
    10 2 2
    11 2 2
3 1 1
    12 3 2
    13 3 2
    14 3 2
    15 3 2
    16 3 2
4 1 1
    17 4 2
    18 4 2
    19 4 2
    20 4 2
    21 4 2
5 1 1
    22 5 2
    23 5 2
    24 5 2
    25 5 2
    26 5 2
6 1 1
    27 6 2
    28 6 2
    29 6 2
    30 6 2
    31 6 2
30 rows fetched in 0,0031s (14,1724s)

The tree is now limited to the first two levels and we can format the tree for using on a catalog page.

This query, though, is not very efficient, as we there are almost 100,000 function calls which are quite expensive in MySQL.

We need to change the function so that it stops traversing the tree as soon as it reaches the certain depth. When it reaches the end of the tree, @id will become NULL and the function will not be called anymore.

We will create a new function that accepts the tree depth as a parameter:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id_with_level(value INT, maxlevel INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE _i INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;
        SET _i = 0;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id
                        AND COALESCE(@level < maxlevel, TRUE);
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
                SET _i = _i + 1;
        END LOOP;
        RETURN NULL;
END

The new function, being used inside the same query, yields the following results:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id_with_level(id, 2) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 1,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
treeitem parent level
2 1 1
    7 2 2
    8 2 2
    9 2 2
    10 2 2
    11 2 2
3 1 1
    12 3 2
    13 3 2
    14 3 2
    15 3 2
    16 3 2
4 1 1
    17 4 2
    18 4 2
    19 4 2
    20 4 2
    21 4 2
5 1 1
    22 5 2
    23 5 2
    24 5 2
    25 5 2
    26 5 2
6 1 1
    27 6 2
    28 6 2
    29 6 2
    30 6 2
    31 6 2
30 rows fetched in 0,0050s (0,0539s)

There are only 30 functions calls this time instead of 100,000 and the query performs much better.

If maxlevel is NULL, the new function behaves exactly as the previous one.

In an efficient programming language such as C++ it would be reasonable to re-implement the old function as a wrapper for the new one with maxlevel set to NULL.

But in MySQL, unfortunately, the function call itself consumes lots of resources. Such a solution, though looking elegant, would severely impact performance.

That’s why we will just keep two versions of the functions.

To be continued.

Written by Quassnoi

March 18th, 2009 at 11:00 pm

Posted in MySQL

Comments are closed.