EXPLAIN EXTENDED

How to create fast database queries

Fallback language names: MySQL

Comments enabled. I *really* need your comment

This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:

A quick reminder of the problem taken from Stack Overflow:

I have table item and another table language which contains names for the items in different languages:

item language data

How do I select a French name for an item if it exists, or a fallback English name if there is no French one?

We basically have three options here:

  1. Use COALESCE on two SELECT list subqueries
  2. Use COALESCE on the results of two LEFT JOINS
  3. Use the combination of methods above: a LEFT JOIN for French names and a subquery for English ones

Efficiency of each of these method depends of the fallback probability (how many items are covered by the localization).

If the localization is poor and but few terms are translated into the local language, the probability of the fallback is high. I took Latin language as an example for this.

If almost all terms are translated, the probability of fallback is low. In this case, I took French as an example (as in the original quiestion), since it is widely used and localizations are likely to cover most terms.

In Oracle, SQL Server and PostgreSQL, the second method (two LEFT JOINs) is more efficient to query poorly localized languages, while for well-localized languages the third query should be used, i. e. a LEFT JOIN for the local language and a subquery for the fallback one.

To gain efficiency, all these systems used some kind of a join method that performs better on the large portions of the rowset, i. e. HASH JOIN or MERGE JOIN.

MySQL, however, is only capable of doing nested loops, so MySQL's performance should differ from one of the engines tested earlier.

Let's create sample tables and see:


Table creation scripts

t_item contains 1,000,000 items.

t_language contains 1,000,000 English names, 999,000 French names and 2,000 Latin names.

French is an example of low fallback probability language (good localization), Latin is an example of poor localization.

Two subqueries


Click below to see details for French and Latin queries:

French

Latin

We see that the Latin query is much less efficient than the French one (26.6 seconds against 7 seconds).

This is because in MySQL using unbalanced composite keys, key misses are much less efficient than key hits.

Two LEFT JOINs


Click below to see details for French and Latin queries:

French

Latin

We see that unlike other systems, the French query gets even worse if we replace subqueries with JOINs. The Latin query gets a little more efficient.

This is because LEFT JOIN in MySQL behaves just like a subquery, except that it always evaluates both tables. MySQL always uses nested loops and no HASH JOIN or MERGE JOIN improvements are made.

The French query, however, is still more efficient than the Latin one.

One subquery and one join


Click below to see details for French and Latin queries:

French

Latin

The French query is more efficient in this case (4.85 seconds).

Latin one, on the other hand, has same efficiency.

Summary


Since MySQL uses only one join method, namely Nested Loops, and key misses are less efficient than key hits, fallback probability has a reverse effect: high fallback queries are less performant than low fallback ones.

Two LEFT JOIN (which are more efficient for the French query in Oracle, SQL Server and PostgreSQL) are in all cases less efficient than a LEFT JOIN and a subquery.

Written by Quassnoi

August 10th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply