EXPLAIN EXTENDED

How to create fast database queries

Hierarchical queries in MySQL

with 42 comments

There is no need in explaining how convenient hierarchical queries are.

A quick reminder: hierarchical data is a parent-child relationship contained in one table.

A typical task is to return values from the table in the following way:

  1. Resultset should be sorted like a tree, that is lexical sort by ancestry chains
  2. Depth level should be returned along with each row

It may sound confusing, but it's very simple in fact, like shown on this Oracle query:

SELECT  LPAD(' ', level * 4, ' ') || id, parent, level
FROM    t_hierarchy
START WITH
        parent = 0
CONNECT BY
        parent = PRIOR id
id parent level
1 0 1
2 1 2
7 2 3
8 2 3
9 2 3
10 2 3
11 2 3
3 1 2
12 3 3
13 3 3
14 3 3
15 3 3
16 3 3
4 1 2
17 4 3
18 4 3
19 4 3
20 4 3
21 4 3
5 1 2
22 5 3
23 5 3
24 5 3
25 5 3
26 5 3
6 1 2
27 6 3
28 6 3
29 6 3
30 6 3
31 6 3

We have a nice tree sorted as a tree, with rows indented according to the depth level.

In the query above, START WITH defines the root of the tree, and CONNECT BY defines join condition between parent and child rows. Parent columns are defined by adding PRIOR keyword to them.

In MySQL there is no such construct, but it can be emulated.

We may try construct a sorting function somehow, but it will be too complex to implement and probably inefficient.

Instead, we will develop our function in the following way:

  1. The function will be called for each row in a rowset
  2. We will not use the actual values of the rowset, but instead keep all intermediary data in the session variables. Rowset will serve just as a dummy source of rows for subsequent joins
  3. We will not use recursion, as it's impossible to keep the recursion state between function calls. Instead, we will just calculate next id given only current id

How do we find the next id given a current one?

  1. Initially, id is set to the id of the root we start with.
  2. If there is a row in the table with the same parent but greater id, we return that greater id, i. e. we return the next sibling of our row.
  3. If there current row is last of its parent, the parent becomes the current id, and we try repeat the step above.
  4. If the our row is the last child of the root, we return NULL for this call and all subsequent calls.

This is implemented as following:

Table creation details

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
        END LOOP;       
END

As you can see, this function takes a value as an input agrument, but doesn't actually use it. This is because MySQL ignores NON DETERMINISTIC clause in the function definition, caches it's return value and doesn't actually call it if it's called with the same arguments.

To avoid this, we need some non-repeating value, that is id.

We use this function in a query as following:

SELECT  CONCAT(REPEAT('    ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 0,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id

In the first subquery, we initialize the variables and set our root item.

Note that it's unadvisable to use NULL as a root, because it's not comparable with an equality condition and therefore cannot be easily searched for without syntax change.

We use a 0 instead, because real id's traditionally start with 1.

We check for @id IS NOT NULL in the WHERE clause to optimize the query. Our function is written so that @id, being once set to NULL, will always remain NULL, and we can skip remaining rows.

For each row of our rowset (which is used only as an iterator loop), the function returns next id in a tree. We then join another instance of t_hierarchy on this id.

As a result of the query, we get:

treeitem parent level
1 0 1
2 1 2
7 2 3
32 7 4
157 32 5
782 157 6
3907 782 7
19532 3907 8
19533 3907 8
19534 3907 8
19535 3907 8
19536 3907 8
3908 782 7
19537 3908 8
19538 3908 8
97651 19530 8
19531 3906 7
97652 19531 8
97653 19531 8
97654 19531 8
97655 19531 8
97656 19531 8
97656 rows fetched in 1,1108s (13,7909s)

To be continued.

Written by Quassnoi

March 17th, 2009 at 11:00 pm

Posted in MySQL

42 Responses to 'Hierarchical queries in MySQL'

Subscribe to comments with RSS or TrackBack to 'Hierarchical queries in MySQL'.

  1. Hi,

    I Really like your blog post about these hierarchical queries.

    I like to know one more thing. Is it possible to order by a list order within the childs belonging to one parent?

    THanxs!

    Theo

    24 Oct 12 at 18:31

  2. @Theo: this is one of the most asked questions.

    Yes, it is possible, and it’s the first thing I’m writing about when I’m back to blogging (may be even this year!).

    Watch this space.

    Quassnoi

    24 Oct 12 at 18:33

  3. Another question.

    Could we rebuild this function so that I could trow in a Child_id and it would find all it’s parent / and their parent (and so on) ?

    Theo

    25 Oct 12 at 14:24

  4. @Theo: use the approach shown here: http://explainextended.com/2009/03/25/sorting-lists/

    Quassnoi

    25 Oct 12 at 14:26

  5. You are simply outstanding :)

    Gufran

    4 Nov 12 at 18:56

  6. Can this query and the function you wrote work for any database implementation of database in the market? i badly need database independent hierarchial query according to ansi standards which work for any database

    zeeshan khan

    8 Nov 12 at 15:55

  7. @Zeeshan: no, it’s MySQL specific. You’re not the only one who needs it.

    Quassnoi

    8 Nov 12 at 15:57

  8. It is working so well!!!

    George

    3 Jan 13 at 14:09

  9. This is some brilliant work – I’m using a slightly modified version of your code for pulling data hierarchically and it’s AMAZING – easy and fast. I’ve cut the lines of code needed to one-quarter what I had before, more more scary recursive code, and the solution is much more elegant while being no noticeably slower (the DB query is about 0.05 seconds longer but I’m saving time/memory in my code now which likely makes up for that). Thank you again for sharing this jewel.

    PanzerBjorn

    7 Feb 13 at 13:07

  10. Hi.

    Great work! It helps me a lot. Thx.
    But I was wandering, why it’s so hard to handle NULL values in table… If You are using foreign keys for parent, You need to have null as root.
    So my working example is to enclose twice of COALESCE() function in hierarchy_connect_by_parent_eq_prior_id function.

    First time in WHERE clause in line 21. To handle null values in table this line should look like this:

    WHERE COALESCE(parent, 0) = _parent

    To work properly, You need to ensure, that _parent will never be set to null, so the line no. 28 of mentioned function should looks like this:

    SELECT id, COALESCE(parent, 0)

    As of now, You can store NULL valuses in parent cell, and You are able to define foreign key for column parent referencing to id.

    Enjoy and best regards :)

    PS: sorry, for any mistakes in spelling ;)

    Martin.

    Martinezio

    7 Mar 13 at 14:26

  11. Support request: trying to use the idea, changed the function to use my table, my ID and parent fields. Doing simple
    SELECT hierarchy_connect_by_parent_eq_prior_id_with_level(uid) AS uid, @level AS level
    FROM (
    SELECT @start_with := 0,
    @id := @start_with,
    @level := 0
    ) vars, my_table
    WHERE @uid IS NOT NULL

    returns no data even before I join anything. My parent field is INT NOT NULL and has 0 by default. I’m clueless, any ideas?

    Slawa

    13 Mar 13 at 19:35

  12. @Slawa: please deploy your sample setup on http://sqlfiddle.com and post the link.

    Quassnoi

    14 Mar 13 at 00:18

  13. Slawa: I think, You should remove the @ sign from the where condition. Currently where condition checks if there is variable named @uid and checks if it isn’t null. But You haven’t such variable defined, so this condition is allways false.

    Martinezio

    14 Mar 13 at 11:36

  14. Slava: or change the internal query vars. Change @id := @start_with to @uid := @start_with

    ;) Good luck.

    Martinezio

    14 Mar 13 at 11:40

  15. Thank you, Martinezio. That did the trick.

    Combined with a leaf detection and some where conditions it runs for 12.723 seconds(!) on a dataset with 1153 rows in MyISAM table. Strange, but after switching to InnoDB it only takes 0.479 sec.

    Slawa

    14 Mar 13 at 18:24

  16. This is a really inefficient solution. Look into http://en.wikipedia.org/wiki/Nested_set_model

    It uses the same number of columns, however each one is an integer, significantly increasing the indexing and operation speeds, without needing padding etc.

    Dale

    23 Mar 13 at 06:36

  17. Quassnoi

    23 Mar 13 at 15:01

  18. Hello,

    This blog helped me. Thank you very much. But i have another question and its making me crazy :)

    actually Theo asked it at top of comments but your answer didnt help me. i asked on here : http://stackoverflow.com/questions/15852045/mysql-sorting-hierarchical-data

    can you give me an idea how can i do this? And i also need to select with where for department_id.

    Mucahit Buyukyilmaz

    6 Apr 13 at 19:06

  19. Howdy, a lot of people have pooled their brains on creating a refined hierarchical function for MySQL here. Many thanks to Quassnoi. It sounds like Slawa has the most refined version (correct me if I am wrong). @Slawa, can you blog about your solution or possibly offer it to others? I’d really like to see what you did for “leaf detection and some where conditions”. We need this to be able to walk a table of taxonomic tree data (organism names) and need to handle about 50,000 records in the tree, but entry could be at any level and we need to all children below the entry level.

    John Zastrow

    7 May 13 at 23:41

  20. > I’d really like to see what you did for “leaf detection and some where conditions”.

    Leaf detection is just another page in the series: http://explainextended.com/2009/03/20/hierarchical-queries-in-mysql-finding-leaves/

    Where conditions were basic “AND NOT hidden AND NOT deleted” as far as I remember.

    And “my solution” was to switch to InnoDB which improved the performance 100 fold.

    Slawa

    8 May 13 at 03:46

  21. This query has a ordering problem.
    If childId & parentID were put in an arbitrary order, then there’re some error like ‘Result consisted more than 1 row’.

    Ex of parent id & child id in arbitrary order:

    id – parent
    5 – 6
    6 – 11
    3 – 2
    11 – 1
    1 – 0
    2 – 0
    9 – 11

    Then this query won’t work

    So my question is how to modify the hierarchy_connect_by_parent_eq_prior_id function so that it can work even if parent id & child id in arbitrary order.

    Tom

    6 Jun 13 at 18:41

  22. @Tom: please prepare a test setup on http://sqlfiddle.com to reproduce the problem.

    Quassnoi

    6 Jun 13 at 18:59

  23. Hi Quassnoi,

    I am very happy to hear from you. I couldn’t create Stored proceduce in SQL Fiddle.

    However, i asked the same question in Stackoverflow

    http://stackoverflow.com/questions/16954293/hierarchical-mysql-how-to-modify-hierarchy-connect-by-parent-eq-prior-id-func

    Can you reproduce a simple table with the data i provided & try to run your SQL, you will see there an error saying that ‘Result consisted more than 1 row’

    Thank you very much

    Tom

    6 Jun 13 at 19:13

  24. Hi Quassnoi,

    http://sqlfiddle.com/#!2/07a67/2

    I think ur query work ok, i don’t understand why if i use other table then something wierd happened.

    Tom

    6 Jun 13 at 19:49

  25. @Tom: please do not post long data here in the comments. Instead, prepare a setup on http://sqlfiddle.com and post the link.

    Quassnoi

    6 Jun 13 at 23:04

  26. Hi Quassnoi,

    I am sorry of posting long data.

    Anyway, i found why i couldn’t run.

    That is if 1 child has more than 1 parent then it will produce ‘Result consisted of more than one row’ Error.

    See this http://sqlfiddle.com/#!2/f048b/1

    Note: child 6 belongs to parent 11 & parent 17

    Can you modify your code so that it can allow 1 child to have more than 1 parent?

    Tom

    7 Jun 13 at 04:39

  27. Hi Quassnoi,

    I want to add some logic info info the case of 1 child belongs to more than 1 parent.

    Case 1: a descendant (or child) belongs to 2 separated ancestors. Ex:

    id – parent
    2 – 1
    3 – 2
    4 – 6
    5 – 4
    3 – 5

    when we searches for “1″, then your query should print out both separated ancestors [1,6] who have same descendant [3]:
    1
    2
    3

    6
    4
    5
    3

    Case 2: a descendant (or child) belongs to 2 ancestors who have the same ancestor. Ex:

    Id – parent
    2 – 1
    3 – 2
    4 – 1
    5 – 4
    3 – 5

    when we searches for “1″, then your query should print out this (Note: 3 belongs to 2 ancestors [2 & 5] who have same ancestor 1):

    1
    2
    3
    4
    5
    3

    Can you modify your code so that it can reflex the above logic?

    Tom

    7 Jun 13 at 11:27

  28. @Tom: we can’t keep a real stack for the MySQL functions. The only thing we can do is to store the previous parent in a session variable.

    So unfortunately this function would only work with trees, not any adjacency list.

    Quassnoi

    7 Jun 13 at 12:27

  29. Hi Quassnoi,

    I knew it’s hard to let 1 child belongs to more than 1 parent, but can u at least modify your code so that it will ignore the duplicate child ID in the Case that a descendant (or child) belongs to 2 separated ancestors. Ex:

    id – parent
    2 – 1
    3 – 2
    4 – 6
    5 – 4
    3 – 5

    when we searches for “1″, it will show
    1 – 2
    - – 2 – 3

    & when w search for “6″, it will show
    6 – 4
    - – 4 – 5
    - – - – 5 – 3

    I have very limited knowledge about stored procedures, but I think u can do it easily since you are expert in Stored procedure.

    Tom

    7 Jun 13 at 20:33

  30. @Tom: as I said, it’s infeasible to do in MySQL.

    Quassnoi

    7 Jun 13 at 22:07

  31. This is helpful but the problem is it is not allowed in some webhost provider such as godaddy. So sad.

    Reggie Banal

    26 Dec 13 at 18:51

  32. @Reggie: never used godaddy but their support mentions it’s possible to use stored procedures on their db: http://support.godaddy.com/help/article/2449/are-stored-procedures-supported-in-mysql

    Quassnoi

    26 Dec 13 at 19:59

  33. hi Quassnoi ,
    You done really great but from your third query where you have used hi and ho is quite confusing. so can you make it simple by using more appropriate name for alias that you have created in third query.
    And one thing how we can add multiple parents in above table ?

    wasim

    31 Jan 14 at 08:08

  34. Wonderful series of articles, thanks very much.
    Quick question:
    SET @level = @level + 1;
    SET @level := @level – 1;
    Why do you use ‘=’ on one line and ‘:=’ on the other? As far as I can tell the effect is the same, but haven’t been able to find any documentation on it.

    Michael

    27 Mar 14 at 13:31

  35. @Michael: no difference.

    Quassnoi

    27 Mar 14 at 20:20

  36. That’s great, cheers!

    Michael

    27 Mar 14 at 20:27

  37. How is the performance?

    If the query is executed very frequently, is there any performance impact?

    George

    George Lee

    14 May 14 at 06:34

  38. Thanks for the wonderful post.

    I need to change the query in a way that it returns a list of all descendants of a given parent.
    Also in my case id is a string and is not in any sort order so cannot apply id > _id.

    Please help.

    priyanka

    27 May 14 at 16:22

  39. Same idea but just only one request with no function call needed, simulating iterative tree parsing :

    SELECT
    CONCAT(REPEAT(‘ ‘, level), CAST(node AS CHAR)) AS treeItem,
    parent,
    child,
    level
    FROM
    (
    SELECT
    @idx := @idx + 1 AS rowIndex,
    @node AS node,
    @prev AS parent,
    @child AS child,
    @level AS level,
    @first AS first,
    @first := IF(@next = @child, true, false) AS nextFirst,
    @level := IF(@first, @level + 1, @level – 1) AS nextLevel,
    @child := IF(@first, (SELECT min(id) FROM t_hierarchy WHERE parent = @child), (SELECT min(id) FROM t_hierarchy WHERE parent = @prev AND id > @node)) AS nextChild,
    @prev := IF(@first, @node, (SELECT parent FROM t_hierarchy WHERE id = @prev)) AS nextParent,
    @node := @next AS nextNode,
    @next := IFNULL(@child, @prev) AS nextNext
    FROM
    (SELECT
    @node := (SELECT min(id) FROM t_hierarchy WHERE parent = 0),
    @prev := (SELECT parent FROM t_hierarchy WHERE id = @node),
    @child := (SELECT min(id) FROM t_hierarchy WHERE parent = @node),
    @level := 0,
    @first := true,
    @next := IFNULL(@child, @prev),
    @idx := 0
    ) hinit
    CROSS JOIN
    (SELECT
    0 AS id
    UNION ALL
    SELECT
    h.id
    FROM
    t_hierarchy h
    UNION ALL
    SELECT
    h2.id
    FROM
    (SELECT 0 AS id) h1 INNER JOIN t_hierarchy h2 ON h1.id = h2.parent
    UNION ALL
    SELECT
    h2.id
    FROM
    t_hierarchy h1 INNER JOIN t_hierarchy h2 ON h1.id = h2.parent) hbase
    ) hierarchy
    WHERE
    first = 1
    ORDER BY rowIndex

    SamY

    23 Jul 14 at 14:01

  40. @Tom : I think you can do what you want exactly the same way explained in this blog.
    The difference is you need two tables, the first to store your associations betwen parent and child which allows multiple parents, the second which links with the first and describes authorized paths with only one parent allowed.
    If you have multiple paths with the same parent-child association you will obviously have multiple rows with the same parent-child associations so you need a third column as primary key in your first table which will be used as values for the parent-child associations in your second table.
    You will then do your path select generation in the second table linked with the first and display selected rows values from the first table.

    @Quassnoi : I didn’t thank you for this blog very interesting which inspired me a lot. I haven’t tested performance differences between your explanations and my request, but if you have some advice I will appreciate ;)

    SamY

    24 Jul 14 at 10:54

  41. @SamY: that’s a nice query. I’ve used similar approach elsewhere too (can’t find the article yet).

    But the thing is changing session variables right in the query is kinda black magic in MySQL: it’s not documented, not supported and not guaranteed it won’t change in a future.

    It really gives you much more power than SQL does, and this is great for a one-off query, but I would not risk placing it on a production system, especially on a hosted one, which can apply updates without prior notification.

    Using the function at least guarantees the statements are applied in the same order always, and even that relies on the implementation details.

    Say what happens if MySQL ever goes multithread or multiprocess (I don’t believe I’ll live to see that but just for the sake of the argument)?

    If there are several processes which partition the table and start calling function in parallel, out of order (that’s what SQL Server and Oracle do out of box), we’re screwed.

    Same applies to the session variables in a query, with even more assumptions about undocumented behavior we are implicitly relying on.

    Still, as I said, it’s a nice query, and if you can rely on a certain version of MySQL, it would do its job more than alright.

    Quassnoi

    28 Jul 14 at 18:05

  42. Thanks Quassnoi for your reply and compliment. I come from sql server environment and I’m a little newbie with mysql. The best approach will be to use the WITH RECURSIVE statement when it will be available with mysql. To answer the hypothesis of multithreaded/multiprocess mysql engine I agree it’s not a definitive solution to adopt. But for the moment we have to do with available tools, we’ll change with subsequent versions. It’s how IT works (2000 year bug ;)). Anyway thanks again for this serie of articles. At least it provides intermediate styled solutions, and I repeat it was a source of inspiration for me!!!

    SamY

    29 Jul 14 at 10:37

Leave a Reply