Archive for the ‘MySQL’ Category
Fallback language names: MySQL
Comments enabled. I *really* need your comment
This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
- Fallback language names: Oracle
- Fallback language names: SQL Server
- Fallback language names: PostgreSQL
- Fallback language names: MySQL
A quick reminder of the problem taken from Stack Overflow:
I have table
item
and another tablelanguage
which contains names for theitems
in different languages:
item language data How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
We basically have three options here:
- Use
COALESCE
on twoSELECT
list subqueries - Use
COALESCE
on the results of twoLEFT JOINS
- Use the combination of methods above: a
LEFT JOIN
for French names and a subquery for English ones
Efficiency of each of these method depends of the fallback probability (how many items are covered by the localization).
If the localization is poor and but few terms are translated into the local language, the probability of the fallback is high. I took Latin language as an example for this.
If almost all terms are translated, the probability of fallback is low. In this case, I took French as an example (as in the original quiestion), since it is widely used and localizations are likely to cover most terms.
In Oracle, SQL Server and PostgreSQL, the second method (two LEFT JOIN
s) is more efficient to query poorly localized languages, while for well-localized languages the third query should be used, i. e. a LEFT JOIN
for the local language and a subquery for the fallback one.
To gain efficiency, all these systems used some kind of a join method that performs better on the large portions of the rowset, i. e. HASH JOIN
or MERGE JOIN
.
MySQL, however, is only capable of doing nested loops, so MySQL's performance should differ from one of the engines tested earlier.
Let's create sample tables and see:
MySQL: LIMIT on LIMIT
Comments enabled. I *really* need your comment
From Stack Overflow:
Here's my situation: I want to select first ten entries from a database with
id = $id
.But I want the results to be listed in a certain priority: if
criteria = $criteria
, then I want those results displayed first.Otherwise, I just want to keep displaying the rest of the rows.
This can be done with a very simple query:
SELECT * FROM t_preference ORDER BY category = 1 DESC, id DESC LIMIT 10
In this query, category = 1
is a boolean expression which returns TRUE
when category
is equal to 1, FALSE
otherwise.
Since TRUE
is 1 and FALSE
is 0, TRUE
is greater than FALSE
from ORDER BY
's point of view. We should apply DESC
for the TRUE
results (i. e. rows with category = 1
) to be returned first.
However, this query is not very index friendly, and it will need to select all rows and sort them.
To improve performance, we should split this query into two index-friendly queries and merge them with a UNION
.
Let's create a sample table and see how do we do it:
MySQL: DISTINCT on DISTINCT
Comments enabled. I *really* need your comment
From Stack Overflow:
I currently have the following query:
SELECT group_concat (DISTINCT usrFirst, usrLast) as receiver_name FROM …It outputs my list of names as follows:
JohnDoe,BillSmith,DaveJones, while desired ouput would be:
John Doe, Bill Smith, and Dave JonesBasically, I need help figuring out three things:
- How can I put a space, between the first and last name?
- How can insert a space after each comma?
- How can I add an
andright before the last name displayed?
This kind of output should be really processed on the client (since comma and the and
word are very cultire specific).
From MySQL, we just need a recordset of distinct full names.
Let's create a sample table and see how do we do it:
MySQL: counting items in pairs
Comments enabled. I *really* need your comment
From Stack Overflow:
I currently have an
item
table, and apair
table.The
pair
table simply contains two columns, which contain thePRIMARY KEY
from theitem
table. There is an index onid1, id2
onpair
.A common query is to find a number of items that are featured in the least number of pairs:
SELECT id, COUNT(*) AS count FROM item i LEFT JOIN pair p ON (p.id1 = i.id OR p.id2 = i.id) GROUP BY id ORDER BY count, RAND() LIMIT 100, but the query is horible performance wise.
Is there a better query, and/or data structure for this type of thing?
MySQL is not very good in optimizing OR
conditions.
It's capable of doing index_merge
, which would UNION
the results of two indexes, however, as documentation states, this access method works only when comparing the fields against the constants.
This method, therefore, can be used to serve an OR
condition neither in a JOIN
nor in a subquery.
Let's create sample tables and see the execution plan for the query:
MySQL: selecting rows before and after filtered one
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a basic MySQL table,
terms
, comprised of anid
andterm
field.I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it
An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.
As we all know, MySQL doesn't support a
ROW_NUMBER()
or a similar function so we end up resorting to user variables and sub-selects.I also cannot create a view with user defined variables because MySQL doesn't allow that.
MySQL indeed doesn't support ROW_NUMBER()
and this function indeed can be emulated using session variables:
However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT
can be used instead.
Let's create a sample table and see how:
MySQL: grouping continuous ranges
From Stack Overflow:
Given the following table:
ID State Date 12 1 2009-07-16 10:00 45 2 2009-07-16 13:00 67 2 2009-07-16 14:40 77 1 2009-07-16 15:00 89 1 2009-07-16 15:30 99 1 2009-07-16 16:00 , how can I group by the field
state
, while still maintaining the borders between the state changes?I need
MIN(id)
,MIN(date)
andCOUNT(*)
for each group, and this is expected:
ID State Date Count 12 1 2009-07-16 10:00 1 45 2 2009-07-16 13:00 2 77 1 2009-07-16 15:00 3
Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.
Let's create a sample table:
Hierarchial queries in MySQL: identifying trees
Comments enabled. I *really* need your comment
Continuing on the theme of hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
- Hierarchical data in MySQL: parents and children in one query
Assume we have a table with hierarchical structure like this:
treeitem | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
17 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18 rows fetched in 0.0003s (0.0137s) |
We have two trees here: one starting from 1, another one starting from 2.
The problem is: given any item, we should identify the whole tree this item belongs to, and return the whole tree in the hierarchical order.
This also can be easily done using hierarchical queries in MySQL.
In this article: Hierarchical queries in MySQL I shown how to implement a function that returnes tree items in correct order, being called sequentially.
This function is reentrable and keeps its state in session variables, one of which, @start_with
, defines the parent element for the tree we want to build.
We have two problems here:
- Given an item, define a root of the tree it belongs to
- Build a whole tree, starting from the root
The first problem can be solved by iterating the linked list backwards, starting from the variable given as an input.
This article:
describes how to do it in great detail, that's why I'll just put a query here.
Let's create the table described above:
Hierarchical data in MySQL: parents and children in one query
Answering questions asked on the site.
Michael asks:
I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).
The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.
We need to combine two queries here:
- Original hierarchical query that returns all descendants of a given
id
(a descendancy chain) - A query that would return all ancestors of a given
id
(an ancestry chain)
An id
can have only one parent
, that's why we can employ a linked list technique to build an ancestry chain, like shown in this article:
Here's the query to to this (no functions required):
SELECT @r AS _id, ( SELECT @r := parent FROM t_hierarchy WHERE id = _id ) AS parent, @l := @l + 1 AS lvl FROM ( SELECT @r := 1218, @l := 0, @cl := 0 ) vars, t_hierarchy h WHERE @r <> 0
To combine two queries, we can employ a simple UNION ALL
.
The only problem that is left to preserve the correct level
, since the ancestry chain query conts level
backwards, and the hierarchical query will count it starting from zero.
Let's create a sample table and see what we get:
Read the rest of this entry »
Selecting last forum posts
Comments enabled. I *really* need your comment
From Stack Overflow:
I built a custom forum for my site using MySQL.
The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.
The DB table has the following columns:
id name body date topic_id A topic has the
topic_id
of 0, and replies have thetopic_id
of their parent topic.How do I query 20 last updated topics efficiently?
We have two kinds of topics here: those that had been answered and those that had not.
The queries for these two kinds of topics should be optimized using different methods and it's better to query for them using two different queries merged later using UNION ALL
:
- To find last 20 replies, we will employ the technique described in this article:
This technique uses a composite index on
(date, id)
to find 20 latest replies to distincttopic_id
's:- First, it scans the index on
(date, id)
to retrieve records with a non-zerotopic_id
in descending date order. - For each record found on step 1, it scans the index on
(date, id)
again in descending order and finds the first record with the sametopic_id
. - If the
id
's of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for thistopic_id
and we should return it.
- First, it scans the index on
- To find last 20 unreplied topics, we just use a good old
NOT EXISTS
.In this case a composite index on
(topic_id, date)
will come handy: we will need to select the last20
unreplied topics, and this index will be useful to implement both filtering ontopic_id
and ordering bydate DESC
. - Finally, we will merge these resultsets using
UNION ALL
and limit the result.
Now, let's create a sample table:
Read the rest of this entry »
Overlapping ranges: MySQL
Continuing yesterday's post.
From Stack Overflow:
I have an event table that specifies a date range with
start_date
andend_date
fields.I have another date range, specified in code, that defines the current week as
week_start
andweek_end
.I'd like to query all events for the week.
Now, let's see how do it in MySQL.
In the previous post we saw that SQL Server is not very good in optimizing this construct:
start_date < @week_end AND end_date > @week_start
and using it leads to a full table scan. A more complex condition is more efficient in SQL Server:
(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)
, since two different indexes can be applied to both parts of the condition.
However, the second condition is intrinsically not very efficient which is a problem for almost any datetime range analysis.
A plain B-Tree index is efficient for queries like this:
constant1 < column < constant1
, but not for the queries like this:
column1 < constant < column2
The latter condition is in fact rewritten as following by the optimizer:
column1 < constant AND column2 > constant
and the optimizer is forced to choose between two non-optimal options: either stick to one index and do the filtering (using table lookups) or perform two index lookups and merge the results.
If 1,000,000 rows satisfy the first condition and 1,000,000 satisfy the second one, the engine needs to inspect either 1,000,000 rows with table lookups or 2,000,000 rows using indexes, only to find out that the intersection contains but 10 rows.
Fortunately, MySQL supports R-Tree indexes (called SPATIAL in MySQL because they are primarily used to query spatial data).
Indexes of this kind are particularly good for the queries similar to the one above, and with a little effort they can be used to query time data too.
Let's create a sample table:
Read the rest of this entry »