EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Shared Plan and Algorithm Network Cache (SPANC). You're welcome to read and comment on it.

Hierarchical query in MySQL: limiting parents

Answering questions asked on the site.

James asks:

Your series on hierarchical queries in MySQL is tremendous! I’m using it to create a series of threaded conversations.

I’m wondering if there is a way to paginate these results.

Specifically, let’s say I want to limit the conversations to return 10 root nodes (parent=0) and all of their children in a query.

I can’t just limit the final query, because that will clip off children. I’ve tried to add LIMITs to your stored functions, but I’m not getting the magic just right.

How would you go about doing this?

A quick reminder: MySQL does not support recursion (either CONNECT BY style or recursive CTE style), so using an adjacency list model is a somewhat complicated task.

However, it is still possible. The main idea is storing the recursion state in a session variable and call a user-defined function repeatedly to iterate over the tree, thus emulating recursion. The article mentioned in the question shows how to do that.

Normally, reading and assigning session variables in the same query is discouraged in MySQL, since the order of evaluation is not guaranteed. However, in the case we only use the table as a dummy recordset and no values of the records are actually used in the function, so the actual values returned by the function are completely defined by the function itself. The table is only used to ensure that the function is called enough times, and to present its results in form of a native resultset (which can be returned or joined with).

To do something with the logic of the function (like, imposing a limit on the parent nodes without doing the same on the child nodes), we, therefore, should tweak the function code, not the query that calls the functions. The only thing that matters in such a query is the number of records returned and we don’t know it in design time.

Limiting the parent nodes is quite simple: we just use another session variable to track the number of parent branches yet to be returned and stop processing as soon as the limit is hit, that is the variable becomes zero.

Let’s create a sample table and see how to do this:

Table creation details

There are 100,000 hierarchical records in multiple trees (8 records in each tree).

To limit the number of trees returned, we create the function similar to the one created in the earlier posts and add a little condition that would decrease the session variable, @parent_limit, each time a parent entry is returned. When this variable hits zero, it’s a signal to stop processing the records:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        IF _parent = @start_with AND @parent_limit > 0 THEN
                                SET @parent_limit = @parent_limit - 1;
                        END IF;
                        IF @parent_limit = 0 THEN
                                SET @id = NULL;
                        END IF;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
        END LOOP;
END;

Let’s check it:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level, lmt
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level, @parent_limit as lmt
        FROM    (
                SELECT  @start_with := 0,
                        @parent_limit := 4,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id
treeitem parent level lmt
1 0 1 3
2 1 2 3
4 2 3 3
8 4 4 3
5 2 3 3
3 1 2 3
6 3 3 3
7 3 3 3
9 0 1 2
10 9 2 2
12 10 3 2
16 12 4 2
13 10 3 2
11 9 2 2
14 11 3 2
15 11 3 2
17 0 1 1
18 17 2 1
20 18 3 1
24 20 4 1
21 18 3 1
19 17 2 1
22 19 3 1
23 19 3 1
24 rows fetched in 0.0008s (0.0711s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 25 100.00
1 PRIMARY hi eq_ref PRIMARY PRIMARY 4 ho.id 1 100.00 Using where
2 DERIVED <derived3> system 1 100.00
2 DERIVED t_hierarchy index PRIMARY 4 100650 100.00 Using where; Using index
3 DERIVED No tables used
select concat(repeat('    ',(`ho`.`level` - 1)),cast(`20100418_limit`.`hi`.`id` as char charset latin1)) AS `treeitem`,`20100418_limit`.`hi`.`parent` AS `parent`,`ho`.`level` AS `level`,`ho`.`lmt` AS `lmt` from (select `hierarchy_connect_by_parent_eq_prior_id`(`20100418_limit`.`t_hierarchy`.`id`) AS `id`,(@level) AS `level`,(@parent_limit) AS `lmt` from (select (@start_with:=0) AS `@start_with := 0`,(@parent_limit:=4) AS `@parent_limit := 4`,(@id:=(@start_with)) AS `@id := @start_with`,(@level:=0) AS `@level := 0`) `vars` join `20100418_limit`.`t_hierarchy` where ((@id) is not null)) `ho` join `20100418_limit`.`t_hierarchy` `hi` where (`20100418_limit`.`hi`.`id` = `ho`.`id`)

The first 3 branches in a proper hierarchy, almost instantly. Note that we need set @parent_limit to 4, i. e. one greater than the value we need.

Note that it is possible to achieve the similar behavior without using any recursive functions at all.

On most forums the filtering is performed on the topic starters, so it’s often a good idea to store the id of the topic starter along with each reply. In the sample table a did that as well.

With this, the filtering required becomes very simple:

SELECT  h.*
FROM    (
        SELECT  id
        FROM    t_hierarchy
        WHERE   parent = 0
        ORDER BY
                id
        LIMIT 3
        ) q
JOIN    t_hierarchy h
ON      h.root = q.id
id parent root
1 0 1
2 1 1
3 1 1
4 2 1
5 2 1
6 3 1
7 3 1
8 4 1
9 0 9
10 9 9
11 9 9
12 10 9
13 10 9
14 11 9
15 11 9
16 12 9
17 0 17
18 17 17
19 17 17
20 18 17
21 18 17
22 19 17
23 19 17
24 20 17
24 rows fetched in 0.0007s (0.0024s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3 100.00
1 PRIMARY h ref ix_hierarchy_root ix_hierarchy_root 4 q.id 4 100.00 Using where
2 DERIVED t_hierarchy ref ix_hierarchy_parent ix_hierarchy_parent 4 16830 100.00 Using where; Using index
select `20100418_limit`.`h`.`id` AS `id`,`20100418_limit`.`h`.`parent` AS `parent`,`20100418_limit`.`h`.`root` AS `root` from (select `20100418_limit`.`t_hierarchy`.`id` AS `id` from `20100418_limit`.`t_hierarchy` where (`20100418_limit`.`t_hierarchy`.`parent` = 0) order by `20100418_limit`.`t_hierarchy`.`id` limit 3) `q` join `20100418_limit`.`t_hierarchy` `h` where (`20100418_limit`.`h`.`root` = `q`.`id`)

This solution (that had been used on numerous forum engines for ages) is more efficient, since no function calls are involved, and more simple too.

However, it does not preserve the hierarchical order and does not allow sorting on anything but the topic starter, so if you need anything of these, the recursive function is still a way to go.

Hope that helps.


I’m always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

April 18th, 2010 at 11:00 pm

Posted in MySQL

Comments are closed.