This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
There is no need in explaining how convenient hierarchical queries are.
A quick reminder: hierarchical data is a parent-child relationship contained in one table.
A typical task is to return values from the table in the following way:
- Resultset should be sorted like a tree, that is lexical sort by ancestry chains
- Depth level should be returned along with each row
It may sound confusing, but it's very simple in fact, like shown on this Oracle query:
SELECT LPAD(' ', level * 4, ' ') || id, parent, level FROM t_hierarchy START WITH parent = 0 CONNECT BY parent = PRIOR id
We have a nice tree sorted as a tree, with rows indented according to the depth level.
In the query above,
START WITH defines the root of the tree, and
CONNECT BY defines join condition between parent and child rows. Parent columns are defined by adding
PRIOR keyword to them.
In MySQL there is no such construct, but it can be emulated.
We may try construct a sorting function somehow, but it will be too complex to implement and probably inefficient.
Instead, we will develop our function in the following way:
- The function will be called for each row in a rowset
- We will not use the actual values of the rowset, but instead keep all intermediary data in the session variables. Rowset will serve just as a dummy source of rows for subsequent joins
- We will not use recursion, as it's impossible to keep the recursion state between function calls. Instead, we will just calculate next
idgiven only current
How do we find the next
id given a current one?
idis set to the
idof the root we start with.
- If there is a row in the table with the same
id, we return that greater
id, i. e. we return the next sibling of our row.
- If there current row is last of its parent, the parent becomes the current
id, and we try repeat the step above.
- If the our row is the last child of the root, we return
NULLfor this call and all subsequent calls.
This is implemented as following:
CREATE TABLE t_hierarchy ( id int(10) unsigned NOT NULL AUTO_INCREMENT, parent int(10) unsigned NOT NULL, PRIMARY KEY (id), KEY ix_hierarchy_parent (parent, id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; DELIMITER $$ CREATE PROCEDURE prc_fill_hierarchy (level INT, fill INT) BEGIN DECLARE _level INT; DECLARE _fill INT; INSERT INTO t_hierarchy (id, parent) VALUES (1, 0); SET _fill = 0; WHILE _fill < fill DO INSERT INTO t_hierarchy (parent) VALUES (1); SET _fill = _fill + 1; END WHILE; SET _fill = 1; SET _level = 0; WHILE _level < level DO INSERT INTO t_hierarchy (parent) SELECT hn.id FROM t_hierarchy ho, t_hierarchy hn WHERE ho.parent = 1 AND hn.id > _fill; SET _level = _level + 1; SET _fill = _fill + POWER(fill, _level); END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_fill_hierarchy(6, 5); COMMIT;
CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT NOT DETERMINISTIC READS SQL DATA BEGIN DECLARE _id INT; DECLARE _parent INT; DECLARE _next INT; DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL; SET _parent = @id; SET _id = -1; IF @id IS NULL THEN RETURN NULL; END IF; LOOP SELECT MIN(id) INTO @id FROM t_hierarchy WHERE parent = _parent AND id > _id; IF @id IS NOT NULL OR _parent = @start_with THEN SET @level = @level + 1; RETURN @id; END IF; SET @level := @level - 1; SELECT id, parent INTO _id, _parent FROM t_hierarchy WHERE id = _parent; END LOOP; END
As you can see, this function takes a value as an input agrument, but doesn't actually use it. This is because MySQL ignores
NON DETERMINISTIC clause in the function definition, caches it's return value and doesn't actually call it if it's called with the same arguments.
To avoid this, we need some non-repeating value, that is
We use this function in a query as following:
SELECT CONCAT(REPEAT(' ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level FROM ( SELECT hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level FROM ( SELECT @start_with := 0, @id := @start_with, @level := 0 ) vars, t_hierarchy WHERE @id IS NOT NULL ) ho JOIN t_hierarchy hi ON hi.id = ho.id
In the first subquery, we initialize the variables and set our root item.
Note that it's unadvisable to use
NULL as a root, because it's not comparable with an equality condition and therefore cannot be easily searched for without syntax change.
We use a 0 instead, because real
id's traditionally start with 1.
We check for
@id IS NOT NULL in the
WHERE clause to optimize the query. Our function is written so that
@id, being once set to
NULL, will always remain
NULL, and we can skip remaining rows.
For each row of our rowset (which is used only as an iterator loop), the function returns next
id in a tree. We then join another instance of
t_hierarchy on this
As a result of the query, we get:
|97656 rows fetched in 1,1108s (13,7909s)|
To be continued.