Archive for the ‘MySQL’ Category
Fallback language names: MySQL
Comments enabled. I *really* need your comment
This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
- Fallback language names: Oracle
- Fallback language names: SQL Server
- Fallback language names: PostgreSQL
- Fallback language names: MySQL
A quick reminder of the problem taken from Stack Overflow:
I have table
itemand another tablelanguagewhich contains names for theitemsin different languages:
item language data How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
We basically have three options here:
- Use
COALESCEon twoSELECTlist subqueries - Use
COALESCEon the results of twoLEFT JOINS - Use the combination of methods above: a
LEFT JOINfor French names and a subquery for English ones
Efficiency of each of these method depends of the fallback probability (how many items are covered by the localization).
If the localization is poor and but few terms are translated into the local language, the probability of the fallback is high. I took Latin language as an example for this.
If almost all terms are translated, the probability of fallback is low. In this case, I took French as an example (as in the original quiestion), since it is widely used and localizations are likely to cover most terms.
In Oracle, SQL Server and PostgreSQL, the second method (two LEFT JOINs) is more efficient to query poorly localized languages, while for well-localized languages the third query should be used, i. e. a LEFT JOIN for the local language and a subquery for the fallback one.
To gain efficiency, all these systems used some kind of a join method that performs better on the large portions of the rowset, i. e. HASH JOIN or MERGE JOIN.
MySQL, however, is only capable of doing nested loops, so MySQL's performance should differ from one of the engines tested earlier.
Let's create sample tables and see:
MySQL: LIMIT on LIMIT
Comments enabled. I *really* need your comment
From Stack Overflow:
Here's my situation: I want to select first ten entries from a database with
id = $id.But I want the results to be listed in a certain priority: if
criteria = $criteria, then I want those results displayed first.Otherwise, I just want to keep displaying the rest of the rows.
This can be done with a very simple query:
SELECT *
FROM t_preference
ORDER BY
category = 1 DESC, id DESC
LIMIT 10
In this query, category = 1 is a boolean expression which returns TRUE when category is equal to 1, FALSE otherwise.
Since TRUE is 1 and FALSE is 0, TRUE is greater than FALSE from ORDER BY's point of view. We should apply DESC for the TRUE results (i. e. rows with category = 1) to be returned first.
However, this query is not very index friendly, and it will need to select all rows and sort them.
To improve performance, we should split this query into two index-friendly queries and merge them with a UNION.
Let's create a sample table and see how do we do it:
MySQL: DISTINCT on DISTINCT
Comments enabled. I *really* need your comment
From Stack Overflow:
I currently have the following query:
SELECT group_concat (DISTINCT usrFirst, usrLast) as receiver_name FROM …It outputs my list of names as follows:
JohnDoe,BillSmith,DaveJones, while desired ouput would be:
John Doe, Bill Smith, and Dave JonesBasically, I need help figuring out three things:
- How can I put a space, between the first and last name?
- How can insert a space after each comma?
- How can I add an
andright before the last name displayed?
This kind of output should be really processed on the client (since comma and the and
word are very cultire specific).
From MySQL, we just need a recordset of distinct full names.
Let's create a sample table and see how do we do it:
MySQL: counting items in pairs
Comments enabled. I *really* need your comment
From Stack Overflow:
I currently have an
itemtable, and apairtable.The
pairtable simply contains two columns, which contain thePRIMARY KEYfrom theitemtable. There is an index onid1, id2onpair.A common query is to find a number of items that are featured in the least number of pairs:
SELECT id, COUNT(*) AS count FROM item i LEFT JOIN pair p ON (p.id1 = i.id OR p.id2 = i.id) GROUP BY id ORDER BY count, RAND() LIMIT 100, but the query is horible performance wise.
Is there a better query, and/or data structure for this type of thing?
MySQL is not very good in optimizing OR conditions.
It's capable of doing index_merge, which would UNION the results of two indexes, however, as documentation states, this access method works only when comparing the fields against the constants.
This method, therefore, can be used to serve an OR condition neither in a JOIN nor in a subquery.
Let's create sample tables and see the execution plan for the query:
MySQL: selecting rows before and after filtered one
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a basic MySQL table,
terms, comprised of anidandtermfield.I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it
An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.
As we all know, MySQL doesn't support a
ROW_NUMBER()or a similar function so we end up resorting to user variables and sub-selects.I also cannot create a view with user defined variables because MySQL doesn't allow that.
MySQL indeed doesn't support ROW_NUMBER() and this function indeed can be emulated using session variables:
However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.
Let's create a sample table and see how:
MySQL: grouping continuous ranges
From Stack Overflow:
Given the following table:
ID State Date 12 1 2009-07-16 10:00 45 2 2009-07-16 13:00 67 2 2009-07-16 14:40 77 1 2009-07-16 15:00 89 1 2009-07-16 15:30 99 1 2009-07-16 16:00 , how can I group by the field
state, while still maintaining the borders between the state changes?I need
MIN(id),MIN(date)andCOUNT(*)for each group, and this is expected:
ID State Date Count 12 1 2009-07-16 10:00 1 45 2 2009-07-16 13:00 2 77 1 2009-07-16 15:00 3
Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.
Let's create a sample table:
Hierarchial queries in MySQL: identifying trees
Comments enabled. I *really* need your comment
Continuing on the theme of hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
- Hierarchical data in MySQL: parents and children in one query
Assume we have a table with hierarchical structure like this:
| treeitem | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 14 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 16 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 17 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18 rows fetched in 0.0003s (0.0137s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We have two trees here: one starting from 1, another one starting from 2.
The problem is: given any item, we should identify the whole tree this item belongs to, and return the whole tree in the hierarchical order.
This also can be easily done using hierarchical queries in MySQL.
In this article: Hierarchical queries in MySQL I shown how to implement a function that returnes tree items in correct order, being called sequentially.
This function is reentrable and keeps its state in session variables, one of which, @start_with, defines the parent element for the tree we want to build.
We have two problems here:
- Given an item, define a root of the tree it belongs to
- Build a whole tree, starting from the root
The first problem can be solved by iterating the linked list backwards, starting from the variable given as an input.
This article:
describes how to do it in great detail, that's why I'll just put a query here.
Let's create the table described above:
Hierarchical data in MySQL: parents and children in one query
Answering questions asked on the site.
Michael asks:
I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).
The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.
We need to combine two queries here:
- Original hierarchical query that returns all descendants of a given
id(a descendancy chain) - A query that would return all ancestors of a given
id(an ancestry chain)
An id can have only one parent, that's why we can employ a linked list technique to build an ancestry chain, like shown in this article:
Here's the query to to this (no functions required):
SELECT @r AS _id,
(
SELECT @r := parent
FROM t_hierarchy
WHERE id = _id
) AS parent,
@l := @l + 1 AS lvl
FROM (
SELECT @r := 1218,
@l := 0,
@cl := 0
) vars,
t_hierarchy h
WHERE @r <> 0
To combine two queries, we can employ a simple UNION ALL.
The only problem that is left to preserve the correct level, since the ancestry chain query conts level backwards, and the hierarchical query will count it starting from zero.
Let's create a sample table and see what we get:
Read the rest of this entry »
Selecting last forum posts
Comments enabled. I *really* need your comment
From Stack Overflow:
I built a custom forum for my site using MySQL.
The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.
The DB table has the following columns:
id name body date topic_id A topic has the
topic_idof 0, and replies have thetopic_idof their parent topic.How do I query 20 last updated topics efficiently?
We have two kinds of topics here: those that had been answered and those that had not.
The queries for these two kinds of topics should be optimized using different methods and it's better to query for them using two different queries merged later using UNION ALL:
- To find last 20 replies, we will employ the technique described in this article:
This technique uses a composite index on
(date, id)to find 20 latest replies to distincttopic_id's:- First, it scans the index on
(date, id)to retrieve records with a non-zerotopic_idin descending date order. - For each record found on step 1, it scans the index on
(date, id)again in descending order and finds the first record with the sametopic_id. - If the
id's of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for thistopic_idand we should return it.
- First, it scans the index on
- To find last 20 unreplied topics, we just use a good old
NOT EXISTS.In this case a composite index on
(topic_id, date)will come handy: we will need to select the last20unreplied topics, and this index will be useful to implement both filtering ontopic_idand ordering bydate DESC. - Finally, we will merge these resultsets using
UNION ALLand limit the result.
Now, let's create a sample table:
Read the rest of this entry »
Overlapping ranges: MySQL
Continuing yesterday's post.
From Stack Overflow:
I have an event table that specifies a date range with
start_dateandend_datefields.I have another date range, specified in code, that defines the current week as
week_startandweek_end.I'd like to query all events for the week.
Now, let's see how do it in MySQL.
In the previous post we saw that SQL Server is not very good in optimizing this construct:
start_date < @week_end AND end_date > @week_start
and using it leads to a full table scan. A more complex condition is more efficient in SQL Server:
(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)
, since two different indexes can be applied to both parts of the condition.
However, the second condition is intrinsically not very efficient which is a problem for almost any datetime range analysis.
A plain B-Tree index is efficient for queries like this:
constant1 < column < constant1
, but not for the queries like this:
column1 < constant < column2
The latter condition is in fact rewritten as following by the optimizer:
column1 < constant AND column2 > constant
and the optimizer is forced to choose between two non-optimal options: either stick to one index and do the filtering (using table lookups) or perform two index lookups and merge the results.
If 1,000,000 rows satisfy the first condition and 1,000,000 satisfy the second one, the engine needs to inspect either 1,000,000 rows with table lookups or 2,000,000 rows using indexes, only to find out that the intersection contains but 10 rows.
Fortunately, MySQL supports R-Tree indexes (called SPATIAL in MySQL because they are primarily used to query spatial data).
Indexes of this kind are particularly good for the queries similar to the one above, and with a little effort they can be used to query time data too.
Let's create a sample table:
Read the rest of this entry »
Subscribe in a reader