Hierarchical queries in MySQL
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
There is no need in explaining how convenient hierarchical queries are.
A quick reminder: hierarchical data is a parent-child relationship contained in one table.
A typical task is to return values from the table in the following way:
- Resultset should be sorted like a tree, that is lexical sort by ancestry chains
- Depth level should be returned along with each row
It may sound confusing, but it’s very simple in fact, like shown on this Oracle query:
SELECT LPAD(' ', level * 4, ' ') || id, parent, level
FROM t_hierarchy
START WITH
parent = 0
CONNECT BY
parent = PRIOR id
| id | parent | level |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 1 | 2 |
| 7 | 2 | 3 |
| 8 | 2 | 3 |
| 9 | 2 | 3 |
| 10 | 2 | 3 |
| 11 | 2 | 3 |
| 3 | 1 | 2 |
| 12 | 3 | 3 |
| 13 | 3 | 3 |
| 14 | 3 | 3 |
| 15 | 3 | 3 |
| 16 | 3 | 3 |
| 4 | 1 | 2 |
| 17 | 4 | 3 |
| 18 | 4 | 3 |
| 19 | 4 | 3 |
| 20 | 4 | 3 |
| 21 | 4 | 3 |
| 5 | 1 | 2 |
| 22 | 5 | 3 |
| 23 | 5 | 3 |
| 24 | 5 | 3 |
| 25 | 5 | 3 |
| 26 | 5 | 3 |
| 6 | 1 | 2 |
| 27 | 6 | 3 |
| 28 | 6 | 3 |
| 29 | 6 | 3 |
| 30 | 6 | 3 |
| 31 | 6 | 3 |
We have a nice tree sorted as a tree, with rows indented according to the depth level.
In the query above, START WITH defines the root of the tree, and CONNECT BY defines join condition between parent and child rows. Parent columns are defined by adding PRIOR keyword to them.
In MySQL there is no such construct, but it can be emulated.
We may try construct a sorting function somehow, but it will be too complex to implement and probably inefficient.
Instead, we will develop our function in the following way:
- The function will be called for each row in a rowset
- We will not use the actual values of the rowset, but instead keep all intermediary data in the session variables. Rowset will serve just as a dummy source of rows for subsequent joins
- We will not use recursion, as it’s impossible to keep the recursion state between function calls. Instead, we will just calculate next
idgiven only currentid
How do we find the next id given a current one?
- Initially,
idis set to theidof the root we start with. - If there is a row in the table with the same
parentbut greaterid, we return that greaterid, i. e. we return the next sibling of our row. - If there current row is last of its parent, the parent becomes the current
id, and we try repeat the step above. - If the our row is the last child of the root, we return
NULLfor this call and all subsequent calls.
This is implemented as following:
CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _next INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;
SET _parent = @id;
SET _id = -1;
IF @id IS NULL THEN
RETURN NULL;
END IF;
LOOP
SELECT MIN(id)
INTO @id
FROM t_hierarchy
WHERE parent = _parent
AND id > _id;
IF @id IS NOT NULL OR _parent = @start_with THEN
SET @level = @level + 1;
RETURN @id;
END IF;
SET @level := @level - 1;
SELECT id, parent
INTO _id, _parent
FROM t_hierarchy
WHERE id = _parent;
END LOOP;
END
As you can see, this function takes a value as an input agrument, but doesn’t actually use it. This is because MySQL ignores NON DETERMINISTIC clause in the function definition, caches it’s return value and doesn’t actually call it if it’s called with the same arguments.
To avoid this, we need some non-repeating value, that is id.
We use this function in a query as following:
SELECT CONCAT(REPEAT(' ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level
FROM (
SELECT hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
FROM (
SELECT @start_with := 0,
@id := @start_with,
@level := 0
) vars, t_hierarchy
WHERE @id IS NOT NULL
) ho
JOIN t_hierarchy hi
ON hi.id = ho.id
In the first subquery, we initialize the variables and set our root item.
Note that it’s unadvisable to use NULL as a root, because it’s not comparable with an equality condition and therefore cannot be easily searched for without syntax change.
We use a 0 instead, because real id‘s traditionally start with 1.
We check for @id IS NOT NULL in the WHERE clause to optimize the query. Our function is written so that @id, being once set to NULL, will always remain NULL, and we can skip remaining rows.
For each row of our rowset (which is used only as an iterator loop), the function returns next id in a tree. We then join another instance of t_hierarchy on this id.
As a result of the query, we get:
| treeitem | parent | level | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | 1 | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | 2 | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 32 | 7 | 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 157 | 32 | 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 782 | 157 | 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3907 | 782 | 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19532 | 3907 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19533 | 3907 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19534 | 3907 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19535 | 3907 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19536 | 3907 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3908 | 782 | 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19537 | 3908 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19538 | 3908 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97651 | 19530 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19531 | 3906 | 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97652 | 19531 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97653 | 19531 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97654 | 19531 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97655 | 19531 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97656 | 19531 | 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 97656 rows fetched in 1,1108s (13,7909s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To be continued.
Subscribe in a reader
Hi,
I Really like your blog post about these hierarchical queries.
I like to know one more thing. Is it possible to order by a list order within the childs belonging to one parent?
THanxs!
Theo
24 Oct 12 at 18:31
@Theo: this is one of the most asked questions.
Yes, it is possible, and it’s the first thing I’m writing about when I’m back to blogging (may be even this year!).
Watch this space.
Quassnoi
24 Oct 12 at 18:33
Another question.
Could we rebuild this function so that I could trow in a Child_id and it would find all it’s parent / and their parent (and so on) ?
Theo
25 Oct 12 at 14:24
@Theo: use the approach shown here: http://explainextended.com/2009/03/25/sorting-lists/
Quassnoi
25 Oct 12 at 14:26
You are simply outstanding :)
Gufran
4 Nov 12 at 18:56
Can this query and the function you wrote work for any database implementation of database in the market? i badly need database independent hierarchial query according to ansi standards which work for any database
zeeshan khan
8 Nov 12 at 15:55
@Zeeshan: no, it’s MySQL specific. You’re not the only one who needs it.
Quassnoi
8 Nov 12 at 15:57
It is working so well!!!
George
3 Jan 13 at 14:09
This is some brilliant work – I’m using a slightly modified version of your code for pulling data hierarchically and it’s AMAZING – easy and fast. I’ve cut the lines of code needed to one-quarter what I had before, more more scary recursive code, and the solution is much more elegant while being no noticeably slower (the DB query is about 0.05 seconds longer but I’m saving time/memory in my code now which likely makes up for that). Thank you again for sharing this jewel.
PanzerBjorn
7 Feb 13 at 13:07
Hi.
Great work! It helps me a lot. Thx.
But I was wandering, why it’s so hard to handle NULL values in table… If You are using foreign keys for parent, You need to have null as root.
So my working example is to enclose twice of COALESCE() function in hierarchy_connect_by_parent_eq_prior_id function.
First time in WHERE clause in line 21. To handle null values in table this line should look like this:
WHERE COALESCE(parent, 0) = _parent
To work properly, You need to ensure, that _parent will never be set to null, so the line no. 28 of mentioned function should looks like this:
SELECT id, COALESCE(parent, 0)
As of now, You can store NULL valuses in parent cell, and You are able to define foreign key for column parent referencing to id.
Enjoy and best regards :)
PS: sorry, for any mistakes in spelling ;)
Martin.
Martinezio
7 Mar 13 at 14:26
Support request: trying to use the idea, changed the function to use my table, my ID and parent fields. Doing simple
SELECT hierarchy_connect_by_parent_eq_prior_id_with_level(uid) AS uid, @level AS level
FROM (
SELECT @start_with := 0,
@id := @start_with,
@level := 0
) vars, my_table
WHERE @uid IS NOT NULL
returns no data even before I join anything. My parent field is INT NOT NULL and has 0 by default. I’m clueless, any ideas?
Slawa
13 Mar 13 at 19:35
@Slawa: please deploy your sample setup on http://sqlfiddle.com and post the link.
Quassnoi
14 Mar 13 at 00:18
Slawa: I think, You should remove the @ sign from the where condition. Currently where condition checks if there is variable named @uid and checks if it isn’t null. But You haven’t such variable defined, so this condition is allways false.
Martinezio
14 Mar 13 at 11:36
Slava: or change the internal query vars. Change @id := @start_with to @uid := @start_with
;) Good luck.
Martinezio
14 Mar 13 at 11:40
Thank you, Martinezio. That did the trick.
Combined with a leaf detection and some where conditions it runs for 12.723 seconds(!) on a dataset with 1153 rows in MyISAM table. Strange, but after switching to InnoDB it only takes 0.479 sec.
Slawa
14 Mar 13 at 18:24
This is a really inefficient solution. Look into http://en.wikipedia.org/wiki/Nested_set_model
It uses the same number of columns, however each one is an integer, significantly increasing the indexing and operation speeds, without needing padding etc.
Dale
23 Mar 13 at 06:36
@Dale: http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/
Quassnoi
23 Mar 13 at 15:01
Hello,
This blog helped me. Thank you very much. But i have another question and its making me crazy :)
actually Theo asked it at top of comments but your answer didnt help me. i asked on here : http://stackoverflow.com/questions/15852045/mysql-sorting-hierarchical-data
can you give me an idea how can i do this? And i also need to select with where for department_id.
Mucahit Buyukyilmaz
6 Apr 13 at 19:06
Howdy, a lot of people have pooled their brains on creating a refined hierarchical function for MySQL here. Many thanks to Quassnoi. It sounds like Slawa has the most refined version (correct me if I am wrong). @Slawa, can you blog about your solution or possibly offer it to others? I’d really like to see what you did for “leaf detection and some where conditions”. We need this to be able to walk a table of taxonomic tree data (organism names) and need to handle about 50,000 records in the tree, but entry could be at any level and we need to all children below the entry level.
John Zastrow
7 May 13 at 23:41
> I’d really like to see what you did for “leaf detection and some where conditions”.
Leaf detection is just another page in the series: http://explainextended.com/2009/03/20/hierarchical-queries-in-mysql-finding-leaves/
Where conditions were basic “AND NOT hidden AND NOT deleted” as far as I remember.
And “my solution” was to switch to InnoDB which improved the performance 100 fold.
Slawa
8 May 13 at 03:46