EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘MySQL’ Category

Efficient INSERT ON DUPLICATE KEY UPDATE

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm implementing a memory cache for a table in MySQL that looks like this:

item1 item2 cnt

The original table includes millions of pairs like this and updates rapidly.

To make it all more efficient I want to write new pairs to an identical memory table and update the real table on disk periodically by cron.

The cron should do the following: for each pair, if there is similar pair in the non-memory table, increase the count by the count from the memory table.

If no such pair exists, create it with count from the memory table.

How can I make the flush (from memory table to real table) most efficient?

This is of course best achieved with INSERT ON DUPLICATE KEY UPDATE.

It's a MySQL flavor of INSERT operator that automatically handles key violations and updates the row that causes a violation instead of inserting.

If we provide a SELECT statement as a row source for INSERT ON DUPLICATE KEY UPDATE, multiple input rows may cause the key violation on the target row. In this case, the target row will be updated many times.

To achieve the desired result, we have two options:

  1. Use the memory table as a row source for INSERT ON DUPLICATE KEY UPDATE as is. The target rows will be updated multiple times.
  2. Aggregate the counters in the memory table and use the aggregated values as the row source. The target rows will be updated at most once.

The first solution will update the rows many times (which can be slow), while the second one involves some overhead on GROUP BY.

Let's create the sample tables and see which one is more efficient:
Read the rest of this entry »

Written by Quassnoi

June 27th, 2009 at 11:00 pm

Posted in MySQL

Constant vs. NULL to mark missing values in OUTER JOINs

Comments enabled. I *really* need your comment

From Stack Overflow (shortened for clarity):

I have two tables: homes and home_photos:

It's very possible that a home does not have a photo associated with the home.

In that case, the column primary_photo_group_id will have value 0.

Otherwise, primary_photo_group_id equals the group_id of the photo.

How do I return a NULL if the photo if not defined for a home?

This is of course a classic task for an OUTER JOIN:

SELECT  *
FROM    homes h
LEFT JOIN
        photos p
ON      p.home_photo_group_id = h.primary_photo_group_id
        AND …

This construct, however, can be optimized, and I'll show how.

Let's create and fill the sample tables:
Read the rest of this entry »

Written by Quassnoi

June 24th, 2009 at 11:00 pm

Posted in MySQL

Counting bans

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a MyISAM table in MySQL 5.0 like this:

id login ip banned

I would like to find all users not banned (banned = 0) if at least 5 other users with the same ip have already been banned (banned = 1).

To do this, we should first find which IP's have more than 5 bans and then find all non-banned users for these IP's.

To find the IP's we can use two methods:

  1. Find all banned IP's, group them and filter out those that have COUNT(*) < 5
  2. For each IP possible, find if there are 5 banned records for this IP using a LIMIT clause

The first method doesn't require doing an extra JOIN on the table, but it will do count all values just to figure out if the COUNT(*) is more or less than 5.

The second method stops counting whenever it find the 5th value, but it requires an extra JOIN.

Counting is bad if there are many values to count, and the extra join is bad if there are many JOIN loops.

This means that the first method should be faster if there are lots of banned records, the, and the second method is faster for few banned records and few distinct IP's.

Let's create two sample tables and see:
Read the rest of this entry »

Written by Quassnoi

June 18th, 2009 at 11:00 pm

Posted in MySQL

Flattening timespans: MySQL

Comments enabled. I *really* need your comment

In the previous articles:

, I described how to merge intersecting timespans into several contiguous ranges.

In SQL Server, this task required a table-valued function (TVF) and a cursor.

In Oracle, we had to employ window functions to do this.

Now let's see how to solve this task in MySQL.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000
2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000
2009.06.03 10:13:12:000 11:14:56:000
2009.06.03 10:27:25:000 10:27:31:000
2009.06.03 10:27:39:000 13:53:38:000
2009.06.03 11:14:56:000 11:15:03:000
2009.06.03 11:15:30:000 14:02:14:000
2009.06.03 13:53:38:000 13:53:43:000
2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

MySQL offers easy use of session variables. These are loosely typed variables defined within the session scope which you can assign and evaluate right inside the query.

Using these variables may come in handy for tasks like this.

In MySQL, just like in previous solutions, we will need to order the timespans by start anc calculate the running maximum of the stop.

This is easily done using session variable @edate which we will use to store running maximum. We will update in in the SELECT clause.

We will also declare another variable @r, initialize it with a zero and use it to keep a range number.

If we have a timespan whose start exceeds the previous value of the running maximum then all timespans that had started eariler had also been stopped by this time, and we have a gap.

The current timespan, therefore, belongs to another range, and we need to increment the value of @r.

Finally, we will just group the timespans on the range number and return MIN(start) and MAX(stop) for all timespans in a range. Easy to see that these will be the start and the stop for the whole range.

Here is the sample table:
Read the rest of this entry »

Written by Quassnoi

June 13th, 2009 at 11:00 pm

Posted in MySQL

Latest DISTINCT records: efficient subquery

Comments enabled. I *really* need your comment

In my yesterday's article:

I described an efficient query to select 10 latest distinct IP's from a pagehit table in MySQL.

Here's the query:

SELECT  INET_NTOA(ip), ts
FROM    t_latest2 lo
WHERE   NOT EXISTS (
        SELECT  1
        FROM    t_latest2 li
        WHERE   li.ip = lo.ip
                AND li.ts > lo.ts
        )
ORDER BY
        ts DESC
LIMIT 10

, which works all right if there are lots of distinct IP's (as it is in a real table).

However, due to the bug in MySQL, this query is not as efficient as it should be.

Let's create the sample table and see what the problem is:
Read the rest of this entry »

Written by Quassnoi

June 9th, 2009 at 11:00 pm

Posted in MySQL

Latest DISTINCT records

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table containing pagehit data in MySQL.

How do I grab the 10 latest unique IP's from that table?

This is quite a common task and there are two approaches to solve it.

Let's create two sample tables that differ in IP distribution:
Read the rest of this entry »

Written by Quassnoi

June 8th, 2009 at 11:00 pm

Posted in MySQL

Longest common prefix: MySQL

Comments enabled. I *really* need your comment

This is the fourth and final article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:

, and today we will solve this task for MySQL.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

MySQL solution differs from the previous ones a little.

On the one hand, MySQL is the only RDBMS of all listed above that supplies a built-in GROUP_CONCAT (aggregate function to concatenate strings).

On the other hand, MySQL lacks a way to generate an arbitrary resultset which we relied upon in all solutions above.

This makes it easier to concatenate the results, but a lot harder to find the longest common prefix.

Fortunately, there is a way to do it.

Since 5.1, MySQL offers an XPath manipulation function, namely ExtractValue, that will help us to solve this task.

Unfortunately, ExtractValue is not set generating, i. e. it cannot create an SQL resultset out of an XML nodeset. But XPath itself is quite a powerful tool, and we can implement the longest common prefix algorithm completely in XPath.
Read the rest of this entry »

Written by Quassnoi

June 7th, 2009 at 11:00 pm

Posted in MySQL

Numbering groups

Comments enabled. I *really* need your comment

From Stack Overflow:

I have the following table:

Id Value
1 3
1 12
1 67
2 7
2 99
5 30
5 33
5 4
5 87
5 12
5 1

I'd like to update it:

Id UniqueIdBySubSet Value
1 1 3
1 2 12
1 3 67
2 1 7
2 2 99
5 1 30
5 2 33
5 3 4
5 4 87
5 5 12
5 6 1

Does anyone know how can I do that in MySQL?

These kinds of things are easily done with ROW_NUMBER() in systems that support it.

This function can be emulated in MySQL too. Some time ago I wrote an article on how to do that:

This task, however, is a little more complex. Since we update a single field, we need to change two variables at once in one SET clause.

We cannot use JOIN's here too since we don't yet have a PRIMARY KEY to join on.

However, session variable support in MySQL is flexible enough to do that.

Let's create the table and see how do we do it:
Read the rest of this entry »

Written by Quassnoi

June 1st, 2009 at 11:00 pm

Posted in MySQL

Selecting options

Comments enabled. I *really* need your comment

From Stack Overflow:

I have three checkboxes in my application.

If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.

Can I do this with a single SQL query?

Assuming MySQL.

The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.

This is best achieved by using two mutually exclusive SELECT statements with a UNION ALL.

Let's create a sample table and try to construct the query:
Read the rest of this entry »

Written by Quassnoi

May 26th, 2009 at 11:00 pm

Posted in MySQL

Choosing column order

with one comment

From Stack Overflow:

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother?

Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?

Well, there is reason.

MySQL stores some types (out of which VARCHAR is probably the most important) with variable length.

That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.

Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.

If we need the value of the first column (first non-PRIMARY KEY column, of course), the engine will do it once per row.

But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.

Let's create a sample table and see how big the impact is:
Read the rest of this entry »

Written by Quassnoi

May 21st, 2009 at 11:00 pm

Posted in MySQL