EXPLAIN EXTENDED

How to create fast database queries

Archive for 2009

Matching 3 of 4

Comments enabled. I *really* need your comment

From Stack Overflow:

Say I have a query like this:

SELECT  *
FROM    my_table
WHERE   name = "john doe"
        AND phone = "8183321234"
        AND email = "johndoe@yahoo.com"
        AND address = "330 some lane"

But say I only need 3 conditions out of the 4 to match.

I know I can write a very long query with several OR's, but I was wondering if there was a feature for this?

A nice and elegant solution proposed by Tomalak says:

SELECT  * 
FROM    my_table 
WHERE   CASE WHEN name = "john doe"           THEN 1 ELSE 0 END +
        CASE WHEN phone = "8183321234"        THEN 1 ELSE 0 END +
        CASE WHEN email = "johndoe@yahoo.com" THEN 1 ELSE 0 END +
        CASE WHEN address = "330 some lane"   THEN 1 ELSE 0 END
        >= 3

, but, as noted by the author, it does not use the indexes.

Since the values seem to be quite selective, it will be better to use indexes on them.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

April 17th, 2009 at 11:00 pm

Posted in MySQL

Missing entries

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a case where I want to choose any database entry that have an invalid Country, Region, or Area ID.

By invalid, I mean an ID for a country or region or area that no longer exists in my tables.

I have four tables: Properties, Countries, Regions, Areas.

I was thinking to do it like this:

SELECT  *
FROM    Properties
WHERE   CountryID NOT IN 
        (
        SELECT  CountryID
        FROM    Countries
        )
        OR
        RegionID NOT IN
        (
        SELECT  RegionID
        FROM    Regions
        )
        OR
        AreaID NOT IN
        (
        SELECT  AreaID
        FROM    Areas
        )

Now, is my query right?

We have two options here:

  1. Use a LEFT OUTER JOIN with the three dimension tables (Countries, Regions and Areas), filtering out non-NULL values
  2. Use a NOT IN / NOT EXISTS approach shown above

LEFT JOIN intuitively seems to be more elegant and more efficient than subqueries. But is it really?
Read the rest of this entry »

Written by Quassnoi

April 14th, 2009 at 11:00 pm

Posted in SQL Server

Keeping rows

with one comment

It's very simple to delete, say, 10 top rows in MySQL:

DELETE
FROM   t_deleter
ORDER BY
          id
LIMIT 10

What if we want to delete all rows except the first 10?
Read the rest of this entry »

Written by Quassnoi

April 7th, 2009 at 11:00 pm

Posted in MySQL

Emulating FULL OUTER JOIN in MySQL

Comments enabled. I *really* need your comment

It is widely known that MySQL lacks support for FULL OUTER JOIN.

A common solution to work around this is using a UNION ALL to union two result sets from a LEFT JOIN and a RIGHT JOIN of two tables, adding join_column IS NULL condition to the latter.

Though it works, it's quite inefficient on large tables when used with ORDER BY … LIMIT queries, as it uses a filesort.

Let's create two sample tables and OUTER JOIN them together.

The tables are quite simple: each of them contains a million of multiples of 13 and 17, respectively, thus making each of LEFT JOIN, RIGHT JOIN and INNER JOIN of these tables non-empty:
Read the rest of this entry »

Written by Quassnoi

April 6th, 2009 at 11:00 pm

Posted in MySQL

GROUP_CONCAT in Oracle 10g

with one comment

MySQL has a nice aggregate function called GROUP_CONCAT, which concatenates all strings in a group in given order, separating them with a given separator.

In one of the previous articles, I wrote about emulating this function in PostgreSQL.

Now, we'll try to emulate this function in Oracle 10g.

There are numerous solutions using Oracle's hierarchical function SYS_CONNECT_BY_PATH, but they're bad in the following ways:

  • Not all queries can be rewritten using CONNECT BY
  • Even if they can, CONNECT BY performs poorly on some conditions

Instead, we will emulate this function using Oracle's MODEL clause.
Read the rest of this entry »

Written by Quassnoi

April 5th, 2009 at 11:00 pm

Posted in Oracle

Banning IP’s

with 3 comments

Many webmasters and network service providers keep a database of IP addresses that are not allowed to access their resources.

Such a database can be efficiently queried using MySQL's spatial abilities.

Let's create and fill the sample tables:
Read the rest of this entry »

Written by Quassnoi

April 4th, 2009 at 11:00 pm

Posted in MySQL

Endless loops

Comments enabled. I *really* need your comment

Imagine that we are developing an email server that keeps the outgoing messages in a database.

The server is always busy sending the messages if the outgoing queue is not empty. It may have up to 10 connections at once.

If the message is failed to be sent, it's reinserted to the end of the queue until the retry count is exceeded.

As soon as all messages from a batch succeed or fail to be sent, next 10 messages from the queue are fetched.

How do we write a query to fetch a batch of messages from the queue?

This all can be done in a single query run in a loop. No range checking, no reinserting.
Read the rest of this entry »

Written by Quassnoi

April 3rd, 2009 at 11:00 pm

Posted in MySQL

Emulating SKIP SCAN

Comments enabled. I *really* need your comment

In the previous article I wrote about ranged conditions (<, >, <=, >= or BETWEEN) on one field combined with ORDER BY another field.

We have to choose an index either for filtering or for sorting, as a single index cannot be used for both purposes.

Well, actually, it can.

There is a special access method called SKIP SCAN, designed just for these cases. This method can be used when a filtering condition applied to a secondary column of a composite index. Like, we have and index on (orderer, ranger) and search for ranger < 10.

This method scans the index bottom to top, selecting each possible orderer values. For each orderer, it applies the filtering condition on ranger, thus obtaining a valid range condition on the index that can be iterated. As soon and this range is iterated, the method defines the range for the next orderer, skipping all other values (hence the name).

SKIP SCAN does not perform as well as a simple index scan, but a query can benefit from it is there are few orderer's in the table.

Oracle supports this method directly, but MySQL, of course, not. SKIP SCAN, though, can be easily emulated.
Read the rest of this entry »

Written by Quassnoi

April 2nd, 2009 at 11:00 pm

Posted in MySQL

Choosing index

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with roughly 100,000 blog postings, linked to a table with 50 feeds via an 1:n relationship.

When I query both tables with a SELECT statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (more than 1 second).

SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings`
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` &lt; 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

This is a nice question that illustrates how to choose which indexes to create.

At first blush it seems to be a job for a composite index on (is_active, user_offtopic_count, post_date). Really, we could use the first two fields to filter on and the third field to order by.

But the problem here is that the query uses a range condition on postings.user_offtopic_count. In this case, the index cannot be used for ordering.
Read the rest of this entry »

Written by Quassnoi

April 1st, 2009 at 11:00 pm

Posted in MySQL

Matching whole sets

Comments enabled. I *really* need your comment

From Stack Overflow:

Given the following:

declare @a table
(
    pkid int,
    value int
)

declare @b table
(
    otherID int,
    value int
)

insert into @a values (1, 1000)
insert into @a values (1, 1001)
insert into @a values (2, 1000)
insert into @a values (2, 1001)
insert into @a values (2, 1002)

insert into @b values (-1, 1000)
insert into @b values (-1, 1001)
insert into @b values (-1, 1002)

, how do I query for all the values in @a that completely match up with @b?

{@a.pkid = 1, @b.otherID = -1} would not be returned (only 2 of 3 values match)

{@a.pkid = 2, @b.otherID = -1} would be returned (3 of 3 values match)

We need to compare each set of values from @a to each set of @b and return the id's of non-matching sets.

Two sets are considered matching when for each row from the first set there is an equal row from the second set and vise versa. That is, both the values and the count of the rows should match in both sets.

First thing that comes to mind is to calculate COUNT(*)'s of both sets, then calculate the COUNT(*) for the result of both sets joined on value and make sure all three are equal.

But we don't actually need to COUNT(*) all values. We just need to find out the first row from either of the subsets that has no match in another subset. As soon as we find such a row, we can tell that the sets don't match; if we don't find one, then the sets do match.
Read the rest of this entry »

Written by Quassnoi

March 31st, 2009 at 11:00 pm

Posted in SQL Server