Archive for the ‘MySQL’ Category
MySQL: LIMIT on LIMIT
From Stack Overflow:
Here’s my situation: I want to select first ten entries from a database with
id = $id.But I want the results to be listed in a certain priority: if
criteria = $criteria, then I want those results displayed first.Otherwise, I just want to keep displaying the rest of the rows.
This can be done with a very simple query:
SELECT *
FROM t_preference
ORDER BY
category = 1 DESC, id DESC
LIMIT 10
In this query, category = 1 is a boolean expression which returns TRUE when category is equal to 1, FALSE otherwise.
Since TRUE is 1 and FALSE is 0, TRUE is greater than FALSE from ORDER BY‘s point of view. We should apply DESC for the TRUE results (i. e. rows with category = 1) to be returned first.
However, this query is not very index friendly, and it will need to select all rows and sort them.
To improve performance, we should split this query into two index-friendly queries and merge them with a UNION.
Let’s create a sample table and see how do we do it:
MySQL: DISTINCT on DISTINCT
From Stack Overflow:
I currently have the following query:
SELECT group_concat (DISTINCT usrFirst, usrLast) as receiver_name FROM …It outputs my list of names as follows:
JohnDoe,BillSmith,DaveJones, while desired ouput would be:
John Doe, Bill Smith, and Dave JonesBasically, I need help figuring out three things:
- How can I put a space, between the first and last name?
- How can insert a space after each comma?
- How can I add an
andright before the last name displayed?
This kind of output should be really processed on the client (since comma and the and
word are very cultire specific).
From MySQL, we just need a recordset of distinct full names.
Let’s create a sample table and see how do we do it:
MySQL: counting items in pairs
From Stack Overflow:
I currently have an
itemtable, and apairtable.The
pairtable simply contains two columns, which contain thePRIMARY KEYfrom theitemtable. There is an index onid1, id2onpair.A common query is to find a number of items that are featured in the least number of pairs:
SELECT id, COUNT(*) AS count FROM item i LEFT JOIN pair p ON (p.id1 = i.id OR p.id2 = i.id) GROUP BY id ORDER BY count, RAND() LIMIT 100, but the query is horible performance wise.
Is there a better query, and/or data structure for this type of thing?
MySQL is not very good in optimizing OR conditions.
It’s capable of doing index_merge, which would UNION the results of two indexes, however, as documentation states, this access method works only when comparing the fields against the constants.
This method, therefore, can be used to serve an OR condition neither in a JOIN nor in a subquery.
Let’s create sample tables and see the execution plan for the query:
MySQL: selecting rows before and after filtered one
From Stack Overflow:
I have a basic MySQL table,
terms, comprised of anidandtermfield.I want to create an alphabetically sorted dictionary index (in the literal sense), that would list ten 10 terms above the selected term, and 20 below it
An example of this could be found on Urban Dictionary where on the left column you see the current term highlighted, and a number of terms above it, and some below, all sorted alphabetically.
As we all know, MySQL doesn’t support a
ROW_NUMBER()or a similar function so we end up resorting to user variables and sub-selects.I also cannot create a view with user defined variables because MySQL doesn’t allow that.
MySQL indeed doesn’t support ROW_NUMBER() and this function indeed can be emulated using session variables:
However, it will be an overkill in this case, since a more simple and more efficient set-based solution using LIMIT can be used instead.
Let’s create a sample table and see how:
MySQL: grouping continuous ranges
From Stack Overflow:
Given the following table:
ID State Date 12 1 2009-07-16 10:00 45 2 2009-07-16 13:00 67 2 2009-07-16 14:40 77 1 2009-07-16 15:00 89 1 2009-07-16 15:30 99 1 2009-07-16 16:00 , how can I group by the field
state, while still maintaining the borders between the state changes?I need
MIN(id),MIN(date)andCOUNT(*)for each group, and this is expected:
ID State Date Count 12 1 2009-07-16 10:00 1 45 2 2009-07-16 13:00 2 77 1 2009-07-16 15:00 3
Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.
Let’s create a sample table:
Hierarchial queries in MySQL: identifying trees
Continuing on the theme of hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
- Hierarchical data in MySQL: parents and children in one query
Assume we have a table with hierarchical structure like this:
| treeitem | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 11 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 12 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 14 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 15 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 16 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 17 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18 rows fetched in 0.0003s (0.0137s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We have two trees here: one starting from 1, another one starting from 2.
The problem is: given any item, we should identify the whole tree this item belongs to, and return the whole tree in the hierarchical order.
This also can be easily done using hierarchical queries in MySQL.
In this article: Hierarchical queries in MySQL I shown how to implement a function that returnes tree items in correct order, being called sequentially.
This function is reentrable and keeps its state in session variables, one of which, @start_with, defines the parent element for the tree we want to build.
We have two problems here:
- Given an item, define a root of the tree it belongs to
- Build a whole tree, starting from the root
The first problem can be solved by iterating the linked list backwards, starting from the variable given as an input.
This article:
describes how to do it in great detail, that’s why I’ll just put a query here.
Let’s create the table described above:
Hierarchical data in MySQL: parents and children in one query
Answering questions asked on the site.
Michael asks:
I was wondering how to implement a hierarchical query in MySQL (using the ancestry chains version) for a single row, such that it picks up the parents (if any) and any children (if any).
The idea is, I want to be able to jump in at any point, provide an Id of some sort, and be able to draw out the entire hierarchy for that Id, both upwards and downwards.
We need to combine two queries here:
- Original hierarchical query that returns all descendants of a given
id(a descendancy chain) - A query that would return all ancestors of a given
id(an ancestry chain)
An id can have only one parent, that’s why we can employ a linked list technique to build an ancestry chain, like shown in this article:
Here’s the query to to this (no functions required):
SELECT @r AS _id,
(
SELECT @r := parent
FROM t_hierarchy
WHERE id = _id
) AS parent,
@l := @l + 1 AS lvl
FROM (
SELECT @r := 1218,
@l := 0,
@cl := 0
) vars,
t_hierarchy h
WHERE @r <> 0
To combine two queries, we can employ a simple UNION ALL.
The only problem that is left to preserve the correct level, since the ancestry chain query conts level backwards, and the hierarchical query will count it starting from zero.
Let’s create a sample table and see what we get:
Read the rest of this entry »
Selecting last forum posts
From Stack Overflow:
I built a custom forum for my site using MySQL.
The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.
The DB table has the following columns:
id name body date topic_id A topic has the
topic_idof 0, and replies have thetopic_idof their parent topic.How do I query 20 last updated topics efficiently?
We have two kinds of topics here: those that had been answered and those that had not.
The queries for these two kinds of topics should be optimized using different methods and it’s better to query for them using two different queries merged later using UNION ALL:
- To find last 20 replies, we will employ the technique described in this article:
This technique uses a composite index on
(date, id)to find 20 latest replies to distincttopic_id‘s:- First, it scans the index on
(date, id)to retrieve records with a non-zerotopic_idin descending date order. - For each record found on step 1, it scans the index on
(date, id)again in descending order and finds the first record with the sametopic_id. - If the
id‘s of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for thistopic_idand we should return it.
- First, it scans the index on
- To find last 20 unreplied topics, we just use a good old
NOT EXISTS.In this case a composite index on
(topic_id, date)will come handy: we will need to select the last20unreplied topics, and this index will be useful to implement both filtering ontopic_idand ordering bydate DESC. - Finally, we will merge these resultsets using
UNION ALLand limit the result.
Now, let’s create a sample table:
Read the rest of this entry »
Overlapping ranges: MySQL
Continuing yesterday’s post.
From Stack Overflow:
I have an event table that specifies a date range with
start_dateandend_datefields.I have another date range, specified in code, that defines the current week as
week_startandweek_end.I’d like to query all events for the week.
Now, let’s see how do it in MySQL.
In the previous post we saw that SQL Server is not very good in optimizing this construct:
start_date < @week_end AND end_date > @week_start
and using it leads to a full table scan. A more complex condition is more efficient in SQL Server:
(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)
, since two different indexes can be applied to both parts of the condition.
However, the second condition is intrinsically not very efficient which is a problem for almost any datetime range analysis.
A plain B-Tree index is efficient for queries like this:
constant1 < column < constant1
, but not for the queries like this:
column1 < constant < column2
The latter condition is in fact rewritten as following by the optimizer:
column1 < constant AND column2 > constant
and the optimizer is forced to choose between two non-optimal options: either stick to one index and do the filtering (using table lookups) or perform two index lookups and merge the results.
If 1,000,000 rows satisfy the first condition and 1,000,000 satisfy the second one, the engine needs to inspect either 1,000,000 rows with table lookups or 2,000,000 rows using indexes, only to find out that the intersection contains but 10 rows.
Fortunately, MySQL supports R-Tree indexes (called SPATIAL in MySQL because they are primarily used to query spatial data).
Indexes of this kind are particularly good for the queries similar to the one above, and with a little effort they can be used to query time data too.
Let’s create a sample table:
Read the rest of this entry »
Efficient INSERT ON DUPLICATE KEY UPDATE
From Stack Overflow:
I’m implementing a memory cache for a table in MySQL that looks like this:
item1 item2 cnt The original table includes millions of pairs like this and updates rapidly.
To make it all more efficient I want to write new pairs to an identical memory table and update the real table on disk periodically by cron.
The cron should do the following: for each pair, if there is similar pair in the non-memory table, increase the count by the count from the memory table.
If no such pair exists, create it with count from the memory table.
How can I make the flush (from memory table to real table) most efficient?
This is of course best achieved with INSERT ON DUPLICATE KEY UPDATE.
It’s a MySQL flavor of INSERT operator that automatically handles key violations and updates the row that causes a violation instead of inserting.
If we provide a SELECT statement as a row source for INSERT ON DUPLICATE KEY UPDATE, multiple input rows may cause the key violation on the target row. In this case, the target row will be updated many times.
To achieve the desired result, we have two options:
- Use the memory table as a row source for
INSERT ON DUPLICATE KEY UPDATEas is. The target rows will be updated multiple times. - Aggregate the counters in the memory table and use the aggregated values as the row source. The target rows will be updated at most once.
The first solution will update the rows many times (which can be slow), while the second one involves some overhead on GROUP BY.
Let’s create the sample tables and see which one is more efficient:
Read the rest of this entry »
Subscribe in a reader