Archive for September 30th, 2009
IN vs. JOIN vs. EXISTS: Oracle
Answering questions asked on the site.
Jason asks:
I have a co-worker who swears that Oracle
IN
queries are slow and refuses to use them. For example:SELECT foo FROM bar WHERE bar.stuff IN ( SELECT stuff FROM asdf )Typically, this kind of performance advice strikes me as overgeneralized and my instinct is to ignore it. But I figure I'll give him the benefit of the doubt and ask about it.
So, in general is an
IN
query very expensive? I'm having trouble putting together many non-trivial queries to run onEXPLAIN PLAN
.
Does the IN
predicate always have inferior efficiency compared to it's counterparts, EXISTS
and JOIN
?
Let's check.
First of all, there are at least three ways to check each row in a table against a list of (possible duplicate) values and return each row at most once:
IN
predicate:SELECT foo FROM bar WHERE bar.stuff IN ( SELECT stuff FROM asdf )
EXISTS
predicate:SELECT foo FROM bar WHERE EXISTS ( SELECT NULL FROM asdf WHERE asdf.stuff = bar.stuff )
JOIN
/DISTINCT
:SELECT b.foo FROM ( SELECT DISTINCT stuff FROM asdf ) a JOIN bar b ON b.stuff = a.stuff
All these queries are semantically the same.
Common wisdom advices against using IN
predicates because it just doesn't look a right thing to do.
This is because less experienced SQL developers tend to translate the SQL statements into pseudocode just like they see it, which in this case looks something like this:
foreach ($bar as $bar_record) { foreach ($asdf as $asdf_record) { if ($bar_record->stuff == $asdf_record->stuff) output ($bar_record); break; } }
This just radiates inefficiency, since the inner rowset should be iterated for each row returned.
EXISTS
looks more nice since it at least gives a hint of possibility to use an index. However, it still looks like the same nested loops.
Finally, the JOIN
looks the most promising here since everybody knows joins are optimized (though few know how exactly). But an IN
? No thanks, they say, it will execute once for every row, it's slow, it's bad, it's inefficient!
People, come on. Oracle developers have thought it over ages ago and guess what: they turned out to be smart enough to implement an efficient algorithm for an IN
construct.
Now, let's make two sample tables and see how it's done:
Read the rest of this entry »