EXPLAIN EXTENDED

How to create fast database queries

Archive for September 30th, 2009

IN vs. JOIN vs. EXISTS: Oracle

with 11 comments

Answering questions asked on the site.

Jason asks:

I have a co-worker who swears that Oracle IN queries are slow and refuses to use them. For example:

SELECT  foo
FROM    bar
WHERE   bar.stuff IN
(
SELECT  stuff
FROM    asdf
)

Typically, this kind of performance advice strikes me as overgeneralized and my instinct is to ignore it. But I figure I'll give him the benefit of the doubt and ask about it.

So, in general is an IN query very expensive? I'm having trouble putting together many non-trivial queries to run on EXPLAIN PLAN.

Does the IN predicate always have inferior efficiency compared to it's counterparts, EXISTS and JOIN?

Let's check.

First of all, there are at least three ways to check each row in a table against a list of (possible duplicate) values and return each row at most once:

  • IN predicate:
    SELECT  foo
    FROM    bar
    WHERE   bar.stuff IN
    (
    SELECT  stuff
    FROM    asdf
    )
    
  • EXISTS predicate:
    SELECT  foo
    FROM    bar
    WHERE   EXISTS
    (
    SELECT  NULL
    FROM    asdf
    WHERE   asdf.stuff = bar.stuff
    )
    
  • JOIN / DISTINCT:
    SELECT  b.foo
    FROM    (
    SELECT  DISTINCT stuff
    FROM    asdf
    ) a
    JOIN    bar b
    ON      b.stuff = a.stuff
    

All these queries are semantically the same.

Common wisdom advices against using IN predicates because it just doesn't look a right thing to do.

This is because less experienced SQL developers tend to translate the SQL statements into pseudocode just like they see it, which in this case looks something like this:

foreach ($bar as $bar_record) {
foreach ($asdf as $asdf_record) {
if ($bar_record->stuff == $asdf_record->stuff)
output ($bar_record);
break;
}
}

This just radiates inefficiency, since the inner rowset should be iterated for each row returned.

EXISTS looks more nice since it at least gives a hint of possibility to use an index. However, it still looks like the same nested loops.

Finally, the JOIN looks the most promising here since everybody knows joins are optimized (though few know how exactly). But an IN? No thanks, they say, it will execute once for every row, it's slow, it's bad, it's inefficient!

People, come on. Oracle developers have thought it over ages ago and guess what: they turned out to be smart enough to implement an efficient algorithm for an IN construct.

Now, let's make two sample tables and see how it's done:
Read the rest of this entry »

Written by Quassnoi

September 30th, 2009 at 11:00 pm

Posted in Oracle