EXPLAIN EXTENDED

How to create fast database queries

Archive for February 16th, 2010

Tags in nested sets: efficient indexing

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Travis asks:

I just read your excellent post: Hierarchical data in MySQL: parents and children in one query.

I am currently trying to do something very similar to this. The main difference is, I am working with data that is in a nested set.

I would like to construct a query that will return a resultset of nodes matched with LIKE, the ancestors of each of these nodes, and the immediate children of each of these nodes.

This is a very interesting question: it allows to demonstrate the usage of three types of indexes usable by MyISAM tables: BTREE, SPATIAL and FULLTEXT.

Nested sets is one of two models that are most often used to store hierarchical data in a relational table (the other one being adjacency list model). Unlike adjacency list, nested sets does not require ability to write recursive queries, which SQL originally lacked and still lacks now in MySQL (albeit it can be emulated to some extent). It is widely used in MySQL world.

I described both methods and their comparison in the article I wrote some time ago:

The main problem with the nested sets model is that though it is extremely fast with selecting descendants of a given node, it is very slow in selecting ancestors.

This is because of the way the B-Tree indexes work. They can be used to query for values of a column within the range of two constants, but not for the values of two columns holding a single constant between them. One needs the first condition to select the children (found between the lft and rgt of the given node), and the second condition to select the ancestor (with lft and rgt containing the lft and rgt of the given node).

That's why selecting the children is fast and selecting the ancestors is slow.

To work around this, the sets that form the hierarchy can be described as geometrical objects, with the larger sets containing the smaller sets. These sets can be indexed with a SPATIAL index which is designed specially for this purpose and both children and ancestors can be queried for very efficiently.

Unfortunately, finding the depth level is quite a hard task for the nested sets model even with the SPATIAL indexes.

It would be quite an easy task is MySQL supported recursion: we could just run a query to find the siblings of each record by skipping their whole domains recursively.

However, MySQL's recursion support is very limited and it relies on the session variables, which are not recommended to use in the complex queries.

To cope with this, we need to mix the nested sets and the adjacency list models. Hierarchy will be stored in two seemingly redundant ways: the unique parent and the LineString representing the nested sets.

This will help us to use the R-Tree index to find all ancestors of a given node and also use B-Tree index to find its immediate children.

Finally, the question mentions using LIKE to find the initial nodes. LIKE predicate with the leading wildcards is not sargable in MySQL. However, it seems that the leading wildcards are only used to split the words. In this case, a FULLTEXT index and the MATCH query would be much more efficient, since FULLTEXT index allows indexing a single record with several keys (each one corresponding to a single word in the column's text), so a search for the word in the space separated or a comma separated list uses the index and is much faster than scanning the whole table.

Hence, the query would use all three main types of indexes: BTREE, SPATIAL and FULLTEXT.

To illustrate everything said above, let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

February 16th, 2010 at 11:00 pm

Posted in MySQL