Fallback language names: MySQL

This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:

A quick reminder of the problem taken from Stack Overflow:

I have table item and another table language which contains names for the items in different languages:

item language data

How do I select a French name for an item if it exists, or a fallback English name if there is no French one?

We basically have three options here:

  1. Use COALESCE on two SELECT list subqueries
  2. Use COALESCE on the results of two LEFT JOINS
  3. Use the combination of methods above: a LEFT JOIN for French names and a subquery for English ones

Efficiency of each of these method depends of the fallback probability (how many items are covered by the localization).

If the localization is poor and but few terms are translated into the local language, the probability of the fallback is high. I took Latin language as an example for this.

If almost all terms are translated, the probability of fallback is low. In this case, I took French as an example (as in the original quiestion), since it is widely used and localizations are likely to cover most terms.

In Oracle, SQL Server and PostgreSQL, the second method (two LEFT JOINs) is more efficient to query poorly localized languages, while for well-localized languages the third query should be used, i. e. a LEFT JOIN for the local language and a subquery for the fallback one.

To gain efficiency, all these systems used some kind of a join method that performs better on the large portions of the rowset, i. e. HASH JOIN or MERGE JOIN.

MySQL, however, is only capable of doing nested loops, so MySQL's performance should differ from one of the engines tested earlier.

Let's create sample tables and see:

Written by Quassnoi

August 10th, 2009 at 11:00 pm

Posted in MySQL


From Stack Overflow:

Here's my situation: I want to select first ten entries from a database with id = $id.

But I want the results to be listed in a certain priority: if criteria = $criteria, then I want those results displayed first.

Otherwise, I just want to keep displaying the rest of the rows.

This can be done with a very simple query:

FROM    t_preference
        category = 1 DESC, id DESC

In this query, category = 1 is a boolean expression which returns TRUE when category is equal to 1, FALSE otherwise.

Since TRUE is 1 and FALSE is 0, TRUE is greater than FALSE from ORDER BY's point of view. We should apply DESC for the TRUE results (i. e. rows with category = 1) to be returned first.

However, this query is not very index friendly, and it will need to select all rows and sort them.

To improve performance, we should split this query into two index-friendly queries and merge them with a UNION.

Let's create a sample table and see how do we do it:

August 5th, 2009 at 11:00 pm

August 5th, 2009 at 11:00 pm

Posted in MySQL


From Stack Overflow:

I currently have the following query:

SELECT  group_concat (DISTINCT usrFirst, usrLast) as receiver_name
FROM    …

It outputs my list of names as follows:


, while desired ouput would be:

John Doe, Bill Smith, and Dave Jones

Basically, I need help figuring out three things:

  1. How can I put a space, between the first and last name?
  2. How can insert a space after each comma?
  3. How can I add an and right before the last name displayed?

This kind of output should be really processed on the client (since comma and the and word are very cultire specific).

From MySQL, we just need a recordset of distinct full names.

Let's create a sample table and see how do we do it:

August 1st, 2009 at 11:00 pm

August 1st, 2009 at 11:00 pm

Posted in MySQL

MySQL: counting items in pairs

From Stack Overflow:

I currently have an item table, and a pair table.

The pair table simply contains two columns, which contain the PRIMARY KEY from the item table. There is an index on id1, id2 on pair.

A common query is to find a number of items that are featured in the least number of pairs:

SELECT  id, COUNT(*) AS count
FROM    item i
pair p
ON      (p.id1 = i.id OR p.id2 = i.id)
count, RAND()

, but the query is horible performance wise.

Is there a better query, and/or data structure for this type of thing?

MySQL is not very good in optimizing OR conditions.

It's capable of doing index_merge, which would UNION the results of two indexes, however, as documentation states, this access method works only when comparing the fields against the constants.

This method, therefore, can be used to serve an OR condition neither in a JOIN nor in a subquery.

Let's create sample tables and see the execution plan for the query:

July 31st, 2009 at 11:00 pm

July 31st, 2009 at 11:00 pm

Posted in MySQL

MySQL: selecting rows before and after filtered one

From Stack Overflow:

I have a basic MySQL table, terms, comprised of an id and term field.

I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it

An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.

As we all know, MySQL doesn't support a ROW_NUMBER() or a similar function so we end up resorting to user variables and sub-selects.

I also cannot create a view with user defined variables because MySQL doesn't allow that.

MySQL indeed doesn't support ROW_NUMBER() and this function indeed can be emulated using session variables:

However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.

Let's create a sample table and see how:

July 25th, 2009 at 11:00 pm

July 25th, 2009 at 11:00 pm

Posted in MySQL

MySQL: grouping continuous ranges

with 4 comments

From Stack Overflow:

Given the following table:

ID State Date
12 1 2009-07-16 10:00
45 2 2009-07-16 13:00
67 2 2009-07-16 14:40
77 1 2009-07-16 15:00
89 1 2009-07-16 15:30
99 1 2009-07-16 16:00

, how can I group by the field state, while still maintaining the borders between the state changes?

I need MIN(id), MIN(date) and COUNT(*) for each group, and this is expected:

ID State Date Count
12 1 2009-07-16 10:00 1
45 2 2009-07-16 13:00 2
77 1 2009-07-16 15:00 3

Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.

Let's create a sample table:

July 24th, 2009 at 11:00 pm

July 24th, 2009 at 11:00 pm

Posted in MySQL

Hierarchial queries in MySQL: identifying trees

Continuing on the theme of hierarchical queries in MySQL:

Assume we have a table with hierarchical structure like this:

18 rows fetched in 0.0003s (0.0137s)

We have two trees here: one starting from 1, another one starting from 2.

The problem is: given any item, we should identify the whole tree this item belongs to, and return the whole tree in the hierarchical order.

This also can be easily done using hierarchical queries in MySQL.

In this article: Hierarchical queries in MySQL I shown how to implement a function that returnes tree items in correct order, being called sequentially.

This function is reentrable and keeps its state in session variables, one of which, @start_with, defines the parent element for the tree we want to build.

We have two problems here:

  1. Given an item, define a root of the tree it belongs to
  2. Build a whole tree, starting from the root

The first problem can be solved by iterating the linked list backwards, starting from the variable given as an input.

This article:

describes how to do it in great detail, that's why I'll just put a query here.

Let's create the table described above:

July 22nd, 2009 at 11:00 pm

July 22nd, 2009 at 11:00 pm

Posted in MySQL

Hierarchical data in MySQL: parents and children in one query

with 23 comments

Answering questions asked on the site.

Michael asks:

I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).

The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.

We need to combine two queries here:

  1. Original hierarchical query that returns all descendants of a given id (a descendancy chain)
  2. A query that would return all ancestors of a given id (an ancestry chain)

An id can have only one parent, that's why we can employ a linked list technique to build an ancestry chain, like shown in this article:

Here's the query to to this (no functions required):

SELECT  @r AS _id,
         SELECT  @r := parent
         FROM    t_hierarchy
         WHERE   id = _id
         ) AS parent,
         @l := @l + 1 AS lvl
 FROM    (
         SELECT  @r := 1218,
                 @l := 0,
                 @cl := 0
         ) vars,
         t_hierarchy h
WHERE    @r <> 0

To combine two queries, we can employ a simple UNION ALL.

The only problem that is left to preserve the correct level, since the ancestry chain query conts level backwards, and the hierarchical query will count it starting from zero.

Let's create a sample table and see what we get:
July 20th, 2009 at 11:00 pm

July 20th, 2009 at 11:00 pm

Posted in MySQL

Selecting last forum posts

From Stack Overflow:

I built a custom forum for my site using MySQL.

The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.

The DB table has the following columns:

id name body date topic_id email

A topic has the topic_id of 0, and replies have the topic_id of their parent topic.

How do I query 20 last updated topics efficiently?

We have two kinds of topics here: those that had been answered and those that had not.

The queries for these two kinds of topics should be optimized using different methods and it's better to query for them using two different queries merged later using UNION ALL:

  • To find last 20 replies, we will employ the technique described in this article:

    This technique uses a composite index on (date, id) to find 20 latest replies to distinct topic_id's:

    1. First, it scans the index on (date, id) to retrieve records with a non-zero topic_id in descending date order.
    2. For each record found on step 1, it scans the index on (date, id) again in descending order and finds the first record with the same topic_id.
    3. If the id's of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for this topic_id and we should return it.
  • To find last 20 unreplied topics, we just use a good old NOT EXISTS.

    In this case a composite index on (topic_id, date) will come handy: we will need to select the last 20 unreplied topics, and this index will be useful to implement both filtering on topic_id and ordering by date DESC.

  • Finally, we will merge these resultsets using UNION ALL and limit the result.

Now, let's create a sample table:
July 7th, 2009 at 11:00 pm

July 7th, 2009 at 11:00 pm

Posted in MySQL

Overlapping ranges: MySQL

with one comment

Continuing yesterday's post.

From Stack Overflow:

I have an event table that specifies a date range with start_date and end_date fields.

I have another date range, specified in code, that defines the current week as week_start and week_end.

I'd like to query all events for the week.

Now, let's see how do it in MySQL.

In the previous post we saw that SQL Server is not very good in optimizing this construct:

start_date < @week_end AND end_date > @week_start

and using it leads to a full table scan. A more complex condition is more efficient in SQL Server:

(start_date > @week_start AND start_date < @week_end)
(@week_start BETWEEN start_date AND end_date)

, since two different indexes can be applied to both parts of the condition.

However, the second condition is intrinsically not very efficient which is a problem for almost any datetime range analysis.

A plain B-Tree index is efficient for queries like this:

constant1 < column < constant1

, but not for the queries like this:

column1 < constant < column2

The latter condition is in fact rewritten as following by the optimizer:

column1 < constant AND column2 > constant

and the optimizer is forced to choose between two non-optimal options: either stick to one index and do the filtering (using table lookups) or perform two index lookups and merge the results.

If 1,000,000 rows satisfy the first condition and 1,000,000 satisfy the second one, the engine needs to inspect either 1,000,000 rows with table lookups or 2,000,000 rows using indexes, only to find out that the intersection contains but 10 rows.

Fortunately, MySQL supports R-Tree indexes (called SPATIAL in MySQL because they are primarily used to query spatial data).

Indexes of this kind are particularly good for the queries similar to the one above, and with a little effort they can be used to query time data too.

Let's create a sample table:
July 1st, 2009 at 11:00 pm

July 1st, 2009 at 11:00 pm

Posted in MySQL