EXPLAIN EXTENDED

How to create fast database queries

Archive for April, 2009

Counting missing rows: using DISTINCT

Comments enabled. I *really* need your comment

This is the 2nd of 5 articles covering implementation of NOT IN predicate in several RDBMS'es:

Today, I will tell about some useful tricks that help to optimize these subqueries even more.

Let's create some sample tables:
Read the rest of this entry »

Written by Quassnoi

April 19th, 2009 at 11:00 pm

Posted in MySQL

Counting missing rows

Comments enabled. I *really* need your comment

From Stack Overflow:

In my first programming job I was told that a query within the IN() predicate gets executed for every row contained in the parent query, and therefore using IN should be avoided.

For example, given the query:

SELECT  COUNT(*)
FROM    Table1
WHERE   Table1Id NOT IN
	(
	SELECT	Table1Id
	FROM	Table2
	WHERE	id_user = 1
	)
Rows Inner subquery executions
10 10
100 100
1000 1000
10000 10000

Is this correct? How does the IN predicate actually work?

This is the 1st of 5 articles covering implementation of NOT IN predicate in several RDBMS'es:

In this article I will describe how MySQL implements this predicate.

Let's create the sample tables:
Read the rest of this entry »

Written by Quassnoi

April 18th, 2009 at 11:00 pm

Posted in MySQL

Matching 3 of 4

Comments enabled. I *really* need your comment

From Stack Overflow:

Say I have a query like this:

SELECT  *
FROM    my_table
WHERE   name = "john doe"
        AND phone = "8183321234"
        AND email = "johndoe@yahoo.com"
        AND address = "330 some lane"

But say I only need 3 conditions out of the 4 to match.

I know I can write a very long query with several OR's, but I was wondering if there was a feature for this?

A nice and elegant solution proposed by Tomalak says:

SELECT  * 
FROM    my_table 
WHERE   CASE WHEN name = "john doe"           THEN 1 ELSE 0 END +
        CASE WHEN phone = "8183321234"        THEN 1 ELSE 0 END +
        CASE WHEN email = "johndoe@yahoo.com" THEN 1 ELSE 0 END +
        CASE WHEN address = "330 some lane"   THEN 1 ELSE 0 END
        >= 3

, but, as noted by the author, it does not use the indexes.

Since the values seem to be quite selective, it will be better to use indexes on them.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

April 17th, 2009 at 11:00 pm

Posted in MySQL

Missing entries

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a case where I want to choose any database entry that have an invalid Country, Region, or Area ID.

By invalid, I mean an ID for a country or region or area that no longer exists in my tables.

I have four tables: Properties, Countries, Regions, Areas.

I was thinking to do it like this:

SELECT  *
FROM    Properties
WHERE   CountryID NOT IN 
        (
        SELECT  CountryID
        FROM    Countries
        )
        OR
        RegionID NOT IN
        (
        SELECT  RegionID
        FROM    Regions
        )
        OR
        AreaID NOT IN
        (
        SELECT  AreaID
        FROM    Areas
        )

Now, is my query right?

We have two options here:

  1. Use a LEFT OUTER JOIN with the three dimension tables (Countries, Regions and Areas), filtering out non-NULL values
  2. Use a NOT IN / NOT EXISTS approach shown above

LEFT JOIN intuitively seems to be more elegant and more efficient than subqueries. But is it really?
Read the rest of this entry »

Written by Quassnoi

April 14th, 2009 at 11:00 pm

Posted in SQL Server

Keeping rows

with one comment

It's very simple to delete, say, 10 top rows in MySQL:

DELETE
FROM   t_deleter
ORDER BY
          id
LIMIT 10

What if we want to delete all rows except the first 10?
Read the rest of this entry »

Written by Quassnoi

April 7th, 2009 at 11:00 pm

Posted in MySQL

Emulating FULL OUTER JOIN in MySQL

Comments enabled. I *really* need your comment

It is widely known that MySQL lacks support for FULL OUTER JOIN.

A common solution to work around this is using a UNION ALL to union two result sets from a LEFT JOIN and a RIGHT JOIN of two tables, adding join_column IS NULL condition to the latter.

Though it works, it's quite inefficient on large tables when used with ORDER BY … LIMIT queries, as it uses a filesort.

Let's create two sample tables and OUTER JOIN them together.

The tables are quite simple: each of them contains a million of multiples of 13 and 17, respectively, thus making each of LEFT JOIN, RIGHT JOIN and INNER JOIN of these tables non-empty:
Read the rest of this entry »

Written by Quassnoi

April 6th, 2009 at 11:00 pm

Posted in MySQL

GROUP_CONCAT in Oracle 10g

with one comment

MySQL has a nice aggregate function called GROUP_CONCAT, which concatenates all strings in a group in given order, separating them with a given separator.

In one of the previous articles, I wrote about emulating this function in PostgreSQL.

Now, we'll try to emulate this function in Oracle 10g.

There are numerous solutions using Oracle's hierarchical function SYS_CONNECT_BY_PATH, but they're bad in the following ways:

  • Not all queries can be rewritten using CONNECT BY
  • Even if they can, CONNECT BY performs poorly on some conditions

Instead, we will emulate this function using Oracle's MODEL clause.
Read the rest of this entry »

Written by Quassnoi

April 5th, 2009 at 11:00 pm

Posted in Oracle

Banning IP’s

with 3 comments

Many webmasters and network service providers keep a database of IP addresses that are not allowed to access their resources.

Such a database can be efficiently queried using MySQL's spatial abilities.

Let's create and fill the sample tables:
Read the rest of this entry »

Written by Quassnoi

April 4th, 2009 at 11:00 pm

Posted in MySQL

Endless loops

Comments enabled. I *really* need your comment

Imagine that we are developing an email server that keeps the outgoing messages in a database.

The server is always busy sending the messages if the outgoing queue is not empty. It may have up to 10 connections at once.

If the message is failed to be sent, it's reinserted to the end of the queue until the retry count is exceeded.

As soon as all messages from a batch succeed or fail to be sent, next 10 messages from the queue are fetched.

How do we write a query to fetch a batch of messages from the queue?

This all can be done in a single query run in a loop. No range checking, no reinserting.
Read the rest of this entry »

Written by Quassnoi

April 3rd, 2009 at 11:00 pm

Posted in MySQL

Emulating SKIP SCAN

Comments enabled. I *really* need your comment

In the previous article I wrote about ranged conditions (<, >, <=, >= or BETWEEN) on one field combined with ORDER BY another field.

We have to choose an index either for filtering or for sorting, as a single index cannot be used for both purposes.

Well, actually, it can.

There is a special access method called SKIP SCAN, designed just for these cases. This method can be used when a filtering condition applied to a secondary column of a composite index. Like, we have and index on (orderer, ranger) and search for ranger < 10.

This method scans the index bottom to top, selecting each possible orderer values. For each orderer, it applies the filtering condition on ranger, thus obtaining a valid range condition on the index that can be iterated. As soon and this range is iterated, the method defines the range for the next orderer, skipping all other values (hence the name).

SKIP SCAN does not perform as well as a simple index scan, but a query can benefit from it is there are few orderer's in the table.

Oracle supports this method directly, but MySQL, of course, not. SKIP SCAN, though, can be easily emulated.
Read the rest of this entry »

Written by Quassnoi

April 2nd, 2009 at 11:00 pm

Posted in MySQL