EXPLAIN EXTENDED

How to create fast database queries

Archive for June 16th, 2009

IN vs. JOIN vs. EXISTS

with 16 comments

From Stack Overflow:

I'm wanting to select rows in a table where the primary key is in another table.

I'm not sure if I should use a JOIN or the IN operator in SQL Server 2005:

SELECT  *
FROM    a
WHERE   a.c IN
        (
        SELECT  d
        FROM    b
        )
SELECT  a.*
FROM    a
JOIN    b
ON      a.c = b.d

Is there any significant performance difference between these two SQL queries with a large dataset (i. e. millions of rows)?

Before we begin I should note that the queries provided by the author are not similar in fact and could produce different resultsets.

If the table used in the subquery returns a value twice, a JOIN will also return the matching rows twice, while an IN condition will return them only once.

That's why it's more correct to compare to the following query:

SELECT  a.*
FROM    a
JOIN    (
        SELECT  DISTINCT d
        FROM    b
        ) bo
ON      a.c = bo.d

Now, à nos moutons.

The folk wisdom advises against using IN constructs in favor of JOIN's because they say that the subquery will be executed many times in a loop.

Let's create some sample tables and see if it's really so:
Read the rest of this entry »

Written by Quassnoi

June 16th, 2009 at 11:00 pm

Posted in SQL Server