Archive for the ‘MySQL’ Category
Efficient INSERT ON DUPLICATE KEY UPDATE
Comments enabled. I *really* need your comment
From Stack Overflow:
I'm implementing a memory cache for a table in MySQL that looks like this:
item1 item2 cnt The original table includes millions of pairs like this and updates rapidly.
To make it all more efficient I want to write new pairs to an identical memory table and update the real table on disk periodically by cron.
The cron should do the following: for each pair, if there is similar pair in the non-memory table, increase the count by the count from the memory table.
If no such pair exists, create it with count from the memory table.
How can I make the flush (from memory table to real table) most efficient?
This is of course best achieved with INSERT ON DUPLICATE KEY UPDATE
.
It's a MySQL flavor of INSERT
operator that automatically handles key violations and updates the row that causes a violation instead of inserting.
If we provide a SELECT
statement as a row source for INSERT ON DUPLICATE KEY UPDATE
, multiple input rows may cause the key violation on the target row. In this case, the target row will be updated many times.
To achieve the desired result, we have two options:
- Use the memory table as a row source for
INSERT ON DUPLICATE KEY UPDATE
as is. The target rows will be updated multiple times. - Aggregate the counters in the memory table and use the aggregated values as the row source. The target rows will be updated at most once.
The first solution will update the rows many times (which can be slow), while the second one involves some overhead on GROUP BY
.
Let's create the sample tables and see which one is more efficient:
Read the rest of this entry »
Constant vs. NULL to mark missing values in OUTER JOINs
Comments enabled. I *really* need your comment
From Stack Overflow (shortened for clarity):
I have two tables:
homes
andhome_photos
:It's very possible that a home does not have a photo associated with the home.
In that case, the column
primary_photo_group_id
will have value 0.Otherwise,
primary_photo_group_id
equals thegroup_id
of the photo.How do I return a
NULL
if the photo if not defined for a home?
This is of course a classic task for an OUTER JOIN
:
SELECT * FROM homes h LEFT JOIN photos p ON p.home_photo_group_id = h.primary_photo_group_id AND …
This construct, however, can be optimized, and I'll show how.
Let's create and fill the sample tables:
Read the rest of this entry »
Counting bans
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a MyISAM table in MySQL 5.0 like this:
id login ip banned I would like to find all users not banned (
banned = 0
) if at least 5 other users with the sameip
have already been banned (banned = 1
).
To do this, we should first find which IP
's have more than 5 bans and then find all non-banned users for these IP
's.
To find the IP
's we can use two methods:
- Find all banned
IP
's, group them and filter out those that haveCOUNT(*) < 5
- For each
IP
possible, find if there are 5 banned records for thisIP
using aLIMIT
clause
The first method doesn't require doing an extra JOIN
on the table, but it will do count all values just to figure out if the COUNT(*)
is more or less than 5.
The second method stops counting whenever it find the 5th value, but it requires an extra JOIN
.
Counting is bad if there are many values to count, and the extra join is bad if there are many JOIN
loops.
This means that the first method should be faster if there are lots of banned records, the, and the second method is faster for few banned records and few distinct IP
's.
Let's create two sample tables and see:
Read the rest of this entry »
Flattening timespans: MySQL
Comments enabled. I *really* need your comment
In the previous articles:
, I described how to merge intersecting timespans into several contiguous ranges.
In SQL Server, this task required a table-valued function (TVF) and a cursor.
In Oracle, we had to employ window functions to do this.
Now let's see how to solve this task in MySQL.
A quick reminder of the problem, taken from Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
MySQL offers easy use of session variables. These are loosely typed variables defined within the session scope which you can assign and evaluate right inside the query.
Using these variables may come in handy for tasks like this.
In MySQL, just like in previous solutions, we will need to order the timespans by start
anc calculate the running maximum of the stop
.
This is easily done using session variable @edate
which we will use to store running maximum. We will update in in the SELECT
clause.
We will also declare another variable @r
, initialize it with a zero and use it to keep a range number
.
If we have a timespan whose start
exceeds the previous value of the running maximum then all timespans that had started eariler had also been stopped by this time, and we have a gap.
The current timespan, therefore, belongs to another range, and we need to increment the value of @r
.
Finally, we will just group the timespans on the range number and return MIN(start)
and MAX(stop)
for all timespans in a range. Easy to see that these will be the start
and the stop
for the whole range.
Here is the sample table:
Read the rest of this entry »
Latest DISTINCT records: efficient subquery
Comments enabled. I *really* need your comment
In my yesterday's article:
I described an efficient query to select 10 latest distinct IP
's from a pagehit table in MySQL.
Here's the query:
SELECT INET_NTOA(ip), ts FROM t_latest2 lo WHERE NOT EXISTS ( SELECT 1 FROM t_latest2 li WHERE li.ip = lo.ip AND li.ts > lo.ts ) ORDER BY ts DESC LIMIT 10
, which works all right if there are lots of distinct IP
's (as it is in a real table).
However, due to the bug in MySQL, this query is not as efficient as it should be.
Let's create the sample table and see what the problem is:
Read the rest of this entry »
Latest DISTINCT records
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table containing pagehit data in MySQL.
How do I grab the 10 latest unique
IP
's from that table?
This is quite a common task and there are two approaches to solve it.
Let's create two sample tables that differ in IP
distribution:
Read the rest of this entry »
Longest common prefix: MySQL
Comments enabled. I *really* need your comment
This is the fourth and final article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
, and today we will solve this task for MySQL.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
MySQL solution differs from the previous ones a little.
On the one hand, MySQL is the only RDBMS of all listed above that supplies a built-in GROUP_CONCAT
(aggregate function to concatenate strings).
On the other hand, MySQL lacks a way to generate an arbitrary resultset which we relied upon in all solutions above.
This makes it easier to concatenate the results, but a lot harder to find the longest common prefix.
Fortunately, there is a way to do it.
Since 5.1, MySQL offers an XPath
manipulation function, namely ExtractValue
, that will help us to solve this task.
Unfortunately, ExtractValue
is not set generating, i. e. it cannot create an SQL
resultset out of an XML
nodeset. But XPath
itself is quite a powerful tool, and we can implement the longest common prefix algorithm completely in XPath
.
Read the rest of this entry »
Numbering groups
Comments enabled. I *really* need your comment
From Stack Overflow:
I have the following table:
Id Value 1 3 1 12 1 67 2 7 2 99 5 30 5 33 5 4 5 87 5 12 5 1 I'd like to update it:
Id UniqueIdBySubSet Value 1 1 3 1 2 12 1 3 67 2 1 7 2 2 99 5 1 30 5 2 33 5 3 4 5 4 87 5 5 12 5 6 1 Does anyone know how can I do that in MySQL?
These kinds of things are easily done with ROW_NUMBER()
in systems that support it.
This function can be emulated in MySQL too. Some time ago I wrote an article on how to do that:
This task, however, is a little more complex. Since we update a single field, we need to change two variables at once in one SET
clause.
We cannot use JOIN
's here too since we don't yet have a PRIMARY KEY
to join on.
However, session variable support in MySQL
is flexible enough to do that.
Let's create the table and see how do we do it:
Read the rest of this entry »
Selecting options
Comments enabled. I *really* need your comment
From Stack Overflow:
I have three checkboxes in my application.
If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.
Can I do this with a single
SQL
query?
Assuming MySQL.
The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.
This is best achieved by using two mutually exclusive SELECT
statements with a UNION ALL
.
Let's create a sample table and try to construct the query:
Read the rest of this entry »
Choosing column order
From Stack Overflow:
I know you can
ALTER
the column order in MySQL withFIRST
andAFTER
, but why would you want to bother?Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?
Well, there is reason.
MySQL stores some types (out of which VARCHAR
is probably the most important) with variable length.
That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.
Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.
If we need the value of the first column (first non-PRIMARY KEY
column, of course), the engine will do it once per row.
But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.
Let's create a sample table and see how big the impact is:
Read the rest of this entry »