EXPLAIN EXTENDED

How to create fast database queries

Archive for June, 2009

Latest DISTINCT records: efficient subquery

Comments enabled. I *really* need your comment

In my yesterday's article:

I described an efficient query to select 10 latest distinct IP's from a pagehit table in MySQL.

Here's the query:

SELECT  INET_NTOA(ip), ts
FROM    t_latest2 lo
WHERE   NOT EXISTS (
        SELECT  1
        FROM    t_latest2 li
        WHERE   li.ip = lo.ip
                AND li.ts > lo.ts
        )
ORDER BY
        ts DESC
LIMIT 10

, which works all right if there are lots of distinct IP's (as it is in a real table).

However, due to the bug in MySQL, this query is not as efficient as it should be.

Let's create the sample table and see what the problem is:
Read the rest of this entry »

Written by Quassnoi

June 9th, 2009 at 11:00 pm

Posted in MySQL

Latest DISTINCT records

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table containing pagehit data in MySQL.

How do I grab the 10 latest unique IP's from that table?

This is quite a common task and there are two approaches to solve it.

Let's create two sample tables that differ in IP distribution:
Read the rest of this entry »

Written by Quassnoi

June 8th, 2009 at 11:00 pm

Posted in MySQL

Longest common prefix: MySQL

Comments enabled. I *really* need your comment

This is the fourth and final article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:

, and today we will solve this task for MySQL.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

MySQL solution differs from the previous ones a little.

On the one hand, MySQL is the only RDBMS of all listed above that supplies a built-in GROUP_CONCAT (aggregate function to concatenate strings).

On the other hand, MySQL lacks a way to generate an arbitrary resultset which we relied upon in all solutions above.

This makes it easier to concatenate the results, but a lot harder to find the longest common prefix.

Fortunately, there is a way to do it.

Since 5.1, MySQL offers an XPath manipulation function, namely ExtractValue, that will help us to solve this task.

Unfortunately, ExtractValue is not set generating, i. e. it cannot create an SQL resultset out of an XML nodeset. But XPath itself is quite a powerful tool, and we can implement the longest common prefix algorithm completely in XPath.
Read the rest of this entry »

Written by Quassnoi

June 7th, 2009 at 11:00 pm

Posted in MySQL

Longest common prefix: Oracle

Comments enabled. I *really* need your comment

Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:

This article will describe how to do it in Oracle.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.

Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL clause to solve this task.
Read the rest of this entry »

Written by Quassnoi

June 6th, 2009 at 11:00 pm

Posted in Oracle

Longest common prefix: PostgreSQL

Comments enabled. I *really* need your comment

This is a series of articles on how to strip all strings in a set of their longest common prefix and concatenate the results:

Today, I'll show how to do it in PostgreSQL.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

Like with SQL Server, it is possible to do this in PostgreSQL using a single SQL query.

But since PostgreSQL offers a nice ability to create custom aggregates, I'll better demonstrate how to solve this task using ones.

In my opinion, custom aggregates fit here just perfectly.

Since PostgreSQL lacks native support for aggregate concatenation, we will need to create two custom aggregates here:
Read the rest of this entry »

Written by Quassnoi

June 5th, 2009 at 11:00 pm

Posted in PostgreSQL

Longest common prefix: SQL Server

Comments enabled. I *really* need your comment

From Stack Overflow:

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

I've already encoutered this problem several times, so I'll try to cover solutions for all RDBMS's my blog is about:

I hope this will be interesting as approaches will differ significantly for all four systems.

Today is SQL Server time.

I won't create sample tables here, since I'm demonstrating the principle. Instead, I'll just use dynamically generated data.

What do we need to do here is:

  1. Find least common prefix for each group and its length
  2. Cut off the prefix of each but the first ref, using SUBSTRING
  3. Concatenate the strings using FOR XML

Steps 2 and 3 are quite simple, but the first one needs some effort.

SQL Server doesn't provide a function to find the longest common prefix (LCP), so we'll have to implement it.

With some limitations, it may be done using pure SQL, no UDF's.
Read the rest of this entry »

Written by Quassnoi

June 4th, 2009 at 11:00 pm

Posted in SQL Server

Selecting records unique on each column

Comments enabled. I *really* need your comment

From Stack Overflow:

I need to select 3 columns from a table, but I need each value from any column to be unique in the resultset for this column.

This query:

SELECT DISTINCT TOP 10 a, b, c
FROM   x

will return 10 distinct sets.

How do I do it?

Here we will need to make sure that for each column in a record we are going to return, no previous record returned so far does not contain the same value of the column.

Let's create a sample table and construct the query:
Read the rest of this entry »

Written by Quassnoi

June 3rd, 2009 at 11:00 pm

Posted in SQL Server

Distinct pairs

Comments enabled. I *really* need your comment

From Stack Overflow:

I have this query:

SELECT  DISTINCT id, name
FROM    table1

For a given id, the name will always be the same.

Both fields are indexed. There's no separate table that maps the id to the name.

The table is very large (10,000,000 rows), so the query could take some time.

This query is very fast, since it's indexed:

SELECT  DISTINCT id
FROM    table1

Likewise for this query:

SELECT  DISTINCT id
FROM    table1

Assuming I can't get the database structure changed (a very safe assumption), what's a better way to structure the first query for performance?

The fastest way would of course be building an index on (id, name).

But since we (for whatever reason) cannot change the database structure, it's not an option.

Author, nevertheless, mentioned that for any given id, the name will always be the same. This can help us.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

June 2nd, 2009 at 11:00 pm

Posted in Oracle

Numbering groups

Comments enabled. I *really* need your comment

From Stack Overflow:

I have the following table:

Id Value
1 3
1 12
1 67
2 7
2 99
5 30
5 33
5 4
5 87
5 12
5 1

I'd like to update it:

Id UniqueIdBySubSet Value
1 1 3
1 2 12
1 3 67
2 1 7
2 2 99
5 1 30
5 2 33
5 3 4
5 4 87
5 5 12
5 6 1

Does anyone know how can I do that in MySQL?

These kinds of things are easily done with ROW_NUMBER() in systems that support it.

This function can be emulated in MySQL too. Some time ago I wrote an article on how to do that:

This task, however, is a little more complex. Since we update a single field, we need to change two variables at once in one SET clause.

We cannot use JOIN's here too since we don't yet have a PRIMARY KEY to join on.

However, session variable support in MySQL is flexible enough to do that.

Let's create the table and see how do we do it:
Read the rest of this entry »

Written by Quassnoi

June 1st, 2009 at 11:00 pm

Posted in MySQL