EXPLAIN EXTENDED

This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:

A quick reminder of the problem taken from Stack Overflow:

I have table item and another table language which contains names for the items in different languages:

item language data

How do I select a French name for an item if it exists, or a fallback English name if there is no French one?

We basically have three options here:

Use COALESCE on two SELECT list subqueries
Use COALESCE on the results of two LEFT JOINS
Use the combination of methods above: a LEFT JOIN for French names and a subquery for English ones

Efficiency of each of these method depends of the fallback probability (how many items are covered by the localization).

If the localization is poor and but few terms are translated into the local language, the probability of the fallback is high. I took Latin language as an example for this.

If almost all terms are translated, the probability of fallback is low. In this case, I took French as an example (as in the original quiestion), since it is widely used and localizations are likely to cover most terms.

In Oracle, SQL Server and PostgreSQL, the second method (two LEFT JOINs) is more efficient to query poorly localized languages, while for well-localized languages the third query should be used, i. e. a LEFT JOIN for the local language and a subquery for the fallback one.

To gain efficiency, all these systems used some kind of a join method that performs better on the large portions of the rowset, i. e. HASH JOIN or MERGE JOIN.

MySQL, however, is only capable of doing nested loops, so MySQL's performance should differ from one of the engines tested earlier.

Let's create sample tables and see:

Written by Quassnoi

August 10th, 2009 at 11:00 pm

Posted in MySQL

MySQL: LIMIT on LIMIT

Here's my situation: I want to select first ten entries from a database with id = $id.

But I want the results to be listed in a certain priority: if criteria = $criteria, then I want those results displayed first.

Otherwise, I just want to keep displaying the rest of the rows.

This can be done with a very simple query:

SELECT  *
FROM    t_preference
ORDER BY
        category = 1 DESC, id DESC
LIMIT 10

In this query, category = 1 is a boolean expression which returns TRUE when category is equal to 1, FALSE otherwise.

Since TRUE is 1 and FALSE is 0, TRUE is greater than FALSE from ORDER BY's point of view. We should apply DESC for the TRUE results (i. e. rows with category = 1) to be returned first.

However, this query is not very index friendly, and it will need to select all rows and sort them.

To improve performance, we should split this query into two index-friendly queries and merge them with a UNION.

Let's create a sample table and see how do we do it:

Written by Quassnoi

August 5th, 2009 at 11:00 pm

Posted in MySQL

MySQL: DISTINCT on DISTINCT

I currently have the following query:
SELECT  group_concat (DISTINCT usrFirst, usrLast) as receiver_name
FROM    …
It outputs my list of names as follows:
JohnDoe,BillSmith,DaveJones
, while desired ouput would be:
John Doe, Bill Smith, and Dave Jones
Basically, I need help figuring out three things:

How can I put a space, between the first and last name?

How can insert a space after each comma?

How can I add an and right before the last name displayed?

This kind of output should be really processed on the client (since comma and the and word are very cultire specific).

From MySQL, we just need a recordset of distinct full names.

Let's create a sample table and see how do we do it:

Written by Quassnoi

August 1st, 2009 at 11:00 pm

Posted in MySQL

MySQL: counting items in pairs

I currently have an item table, and a pair table.

The pair table simply contains two columns, which contain the PRIMARY KEY from the item table. There is an index on id1, id2 on pair.

A common query is to find a number of items that are featured in the least number of pairs:
SELECT  id, COUNT(*) AS count
FROM    item i
LEFT JOIN
pair p
ON      (p.id1 = i.id OR p.id2 = i.id)
GROUP BY
id
ORDER BY
count, RAND()
LIMIT 100
, but the query is horible performance wise.

Is there a better query, and/or data structure for this type of thing?

MySQL is not very good in optimizing OR conditions.

It's capable of doing index_merge, which would UNION the results of two indexes, however, as documentation states, this access method works only when comparing the fields against the constants.

This method, therefore, can be used to serve an OR condition neither in a JOIN nor in a subquery.

Let's create sample tables and see the execution plan for the query:

Written by Quassnoi

July 31st, 2009 at 11:00 pm

Posted in MySQL

MySQL: selecting rows before and after filtered one

I have a basic MySQL table, terms, comprised of an id and term field.

I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it

An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.

As we all know, MySQL doesn't support a ROW_NUMBER() or a similar function so we end up resorting to user variables and sub-selects.

I also cannot create a view with user defined variables because MySQL doesn't allow that.

MySQL indeed doesn't support ROW_NUMBER() and this function indeed can be emulated using session variables:

Emulating ROW_NUMBER() in MySQL

However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.

Let's create a sample table and see how:

Written by Quassnoi

July 25th, 2009 at 11:00 pm

Posted in MySQL

MySQL: grouping continuous ranges

with 4 comments

Given the following table:

ID State Date

12 1 2009-07-16 10:00

45 2 2009-07-16 13:00

67 2 2009-07-16 14:40

77 1 2009-07-16 15:00

89 1 2009-07-16 15:30

99 1 2009-07-16 16:00

, how can I group by the field state, while still maintaining the borders between the state changes?

I need MIN(id), MIN(date) and COUNT(*) for each group, and this is expected:

ID State Date Count

12 1 2009-07-16 10:00 1

45 2 2009-07-16 13:00 2

77 1 2009-07-16 15:00 3

ID	State	Date
12	1	2009-07-16 10:00
45	2	2009-07-16 13:00
67	2	2009-07-16 14:40
77	1	2009-07-16 15:00
89	1	2009-07-16 15:30
99	1	2009-07-16 16:00

ID	State	Date	Count
12	1	2009-07-16 10:00	1
45	2	2009-07-16 13:00	2
77	1	2009-07-16 15:00	3

Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.

Let's create a sample table:

Written by Quassnoi

July 24th, 2009 at 11:00 pm

Posted in MySQL

Hierarchial queries in MySQL: identifying trees

Continuing on the theme of hierarchical queries in MySQL:

Assume we have a table with hierarchical structure like this:

treeitem
1
3
9
10
11
4
12
13
14
5
15
16
17
2
6
18
7
8
18 rows fetched in 0.0003s (0.0137s)

We have two trees here: one starting from 1, another one starting from 2.

The problem is: given any item, we should identify the whole tree this item belongs to, and return the whole tree in the hierarchical order.

This also can be easily done using hierarchical queries in MySQL.

In this article: Hierarchical queries in MySQL I shown how to implement a function that returnes tree items in correct order, being called sequentially.

This function is reentrable and keeps its state in session variables, one of which, @start_with, defines the parent element for the tree we want to build.

We have two problems here:

Given an item, define a root of the tree it belongs to
Build a whole tree, starting from the root

The first problem can be solved by iterating the linked list backwards, starting from the variable given as an input.

This article:

Sorting lists

describes how to do it in great detail, that's why I'll just put a query here.

Let's create the table described above:

Written by Quassnoi

July 22nd, 2009 at 11:00 pm

Posted in MySQL

Hierarchical data in MySQL: parents and children in one query

with 23 comments

Answering questions asked on the site.

Michael asks:

I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).

The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.

We need to combine two queries here:

Original hierarchical query that returns all descendants of a given id (a descendancy chain)
A query that would return all ancestors of a given id (an ancestry chain)

An id can have only one parent, that's why we can employ a linked list technique to build an ancestry chain, like shown in this article:

Sorting lists

Here's the query to to this (no functions required):

SELECT  @r AS _id,
         (
         SELECT  @r := parent
         FROM    t_hierarchy
         WHERE   id = _id
         ) AS parent,
         @l := @l + 1 AS lvl
 FROM    (
         SELECT  @r := 1218,
                 @l := 0,
                 @cl := 0
         ) vars,
         t_hierarchy h
WHERE    @r <> 0

To combine two queries, we can employ a simple UNION ALL.

The only problem that is left to preserve the correct level, since the ancestry chain query conts level backwards, and the hierarchical query will count it starting from zero.

Let's create a sample table and see what we get:
Read the rest of this entry »

Written by Quassnoi

July 20th, 2009 at 11:00 pm

Posted in MySQL

Selecting last forum posts

From Stack Overflow:

I built a custom forum for my site using MySQL.

The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.

The DB table has the following columns:

id name body date topic_id email

A topic has the topic_id of 0, and replies have the topic_id of their parent topic.

How do I query 20 last updated topics efficiently?

We have two kinds of topics here: those that had been answered and those that had not.

The queries for these two kinds of topics should be optimized using different methods and it's better to query for them using two different queries merged later using UNION ALL:

To find last 20 replies, we will employ the technique described in this article:
- Latest DISTINCT records: efficient subquery
This technique uses a composite index on (date, id) to find 20 latest replies to distinct topic_id's:
1. First, it scans the index on (date, id) to retrieve records with a non-zero topic_id in descending date order.
2. For each record found on step 1, it scans the index on (date, id) again in descending order and finds the first record with the same topic_id.
3. If the id's of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for this topic_id and we should return it.
To find last 20 unreplied topics, we just use a good old NOT EXISTS.
In this case a composite index on (topic_id, date) will come handy: we will need to select the last 20 unreplied topics, and this index will be useful to implement both filtering on topic_id and ordering by date DESC.
Finally, we will merge these resultsets using UNION ALL and limit the result.

Now, let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

July 7th, 2009 at 11:00 pm

Posted in MySQL

Overlapping ranges: MySQL

with one comment

Continuing yesterday's post.