EXPLAIN EXTENDED

How to create fast database queries

IN vs. JOIN vs. EXISTS

with 6 comments

From Stack Overflow:

I'm wanting to select rows in a table where the primary key is in another table.

I'm not sure if I should use a JOIN or the IN operator in SQL Server 2005:

SELECT  *
FROM    a
WHERE   a.c IN
        (
        SELECT  d
        FROM    b
        )
SELECT  a.*
FROM    a
JOIN    b
ON      a.c = b.d

Is there any significant performance difference between these two SQL queries with a large dataset (i. e. millions of rows)?

Before we begin I should note that the queries provided by the author are not similar in fact and could produce different resultsets.

If the table used in the subquery returns a value twice, a JOIN will also return the matching rows twice, while an IN condition will return them only once.

That's why it's more correct to compare to the following query:

SELECT  a.*
FROM    a
JOIN    (
        SELECT  DISTINCT d
        FROM    b
        ) bo
ON      a.c = bo.d

Now, à nos moutons.

The folk wisdom advises against using IN constructs in favor of JOIN's because they say that the subquery will be executed many times in a loop.

Let's create some sample tables and see if it's really so:

CREATE SCHEMA [20090616_exists]
CREATE TABLE t_outer (
        id INT NOT NULL PRIMARY KEY,
        val1 INT NOT NULL,
        val2 INT NOT NULL
)
CREATE TABLE t_inner (
        id INT NOT NULL PRIMARY KEY,
        val1 INT NOT NULL,
        val2 INT NOT NULL
)
CREATE TABLE t_smallinner (
        id INT NOT NULL PRIMARY KEY,
        val1 INT NOT NULL,
        val2 INT NOT NULL
)
GO
CREATE INDEX ix_outer_val1 ON [20090616_exists].t_outer (val1)
CREATE INDEX ix_inner_val1 ON [20090616_exists].t_inner (val1)
CREATE INDEX ix_smallinner_val1 ON [20090616_exists].t_smallinner (val1)
BEGIN TRANSACTION
SELECT  RAND(20090616)
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 100000
BEGIN
        INSERT
        INTO    [20090616_exists].t_inner
        VALUES  (@cnt, RAND() * 100000000, RAND() * 100000000)
        INSERT
        INTO    [20090616_exists].t_outer
        VALUES  (@cnt, RAND() * 100000000, RAND() * 100000000)
        SET @cnt = @cnt + 1
END

INSERT
INTO    [20090616_exists].t_smallinner
SELECT  TOP 100 
        ROW_NUMBER() OVER (ORDER BY id DESC),
        val1,
        val2
FROM    [20090616_exists].t_outer
ORDER BY
        id DESC

COMMIT
GO

We created two tables, t_outer and t_inner, with 100,000 rows in each of them.

They both have same layout: a PRIMARY KEY column and two random-filled columns, val1 and val2.

val1 is indexed in both tables, val2 is not.

We also created a third table, t_smallinner, which has similar layout and contains the last 100 rows from t_outer.

The IN predicate query looks like this:

SELECT  val1
FROM    [20090616_exists].t_outer o
WHERE   val1 IN
        (
        SELECT  val1
        FROM    [20090616_exists].t_inner
        )
val1
142438
610082
639820
2196768
2206360
2735275
3772483
3852989
4832562
5196377
97089841
97284896
97606445
115 rows fetched in 0.0020s (0.0678s)
Table 't_outer'. Scan count 1, logical reads 227, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 67 ms. 

67 ms is way too fast for 100,000 reevaluations of any query.

Let's look into the plan:

SELECT
  Merge Join
    Index Scan (t_inner)
    Stream Aggregate
      Index Scan (t_outer)

Since val1 is indexed in both tables, the optimizer chose Stream Aggregate to filter out the possible duplicates of t_outer.val.

This method is very simple: it records the last value of the column selected, and if this value is found again it just won't be passed further.

It's very fast and almost free, provided that the input rowset is preordered.

Output rowset, of course, remains ordered.

Now we have two rowsets ordered by val1 and the optimizer chooses Merge Join method to join these resultsets.

Merge Join keeps a kind of a cursor variable which initially points to the least of the leading values of the two columns being joined.

It then returns rows from both resultsets matching this value and advances the cursor to the next matching value which exists in both indexes, skipping non-matching ones.

Then it returns the matching rows again, etc., until either of the resultsets cannot be further advanced.

This is an extremely fast method provided that both resultsets are sorted, and in our case they are.

Now, let's try to repeat the same query using a join:

SELECT  o.val1
FROM    [20090616_exists].t_outer o
JOIN    (
        SELECT  DISTINCT val1
        FROM    [20090616_exists].t_inner
        ) i
ON      i.val1 = o.val1

val1
142438
610082
639820
2196768
2206360
2735275
3772483
3852989
4832562
5196377
97089841
97284896
97606445
115 rows fetched in 0.0020s (0.0664s)

Guess what?

Table 't_outer'. Scan count 1, logical reads 227, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 66 ms. 
SELECT
  Merge Join
    Index Scan (t_inner)
    Stream Aggregate
      Index Scan (t_outer)

Exactly same plan, exactly same statistics.

And what about EXISTS?

SELECT  val1
FROM    [20090616_exists].t_outer o
WHERE   EXISTS (
        SELECT  1
        FROM    [20090616_exists].t_inner s
        WHERE   s.val1 = o.val1
        )

val1
142438
610082
639820
2196768
2206360
2735275
3772483
3852989
4832562
5196377
97089841
97284896
97606445
115 rows fetched in 0.0022s (0.0653s)
Table 't_outer'. Scan count 1, logical reads 227, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 65 ms. 
SELECT
  Merge Join
    Index Scan (t_inner)
    Stream Aggregate
      Index Scan (t_outer)

Believe it or not, but the plan and the statistics are again the same.

Contrary to popular opinion, guys and gals that developed SQL Server managed to notice that these queries are synonyms and code their optimizer to produce the same efficient plan for these queries.

Now, what if we JOIN / SEMI JOIN on non-indexed columns?

SELECT  val2
FROM    [20090616_exists].t_outer
WHERE   val2 IN
        (
        SELECT  val2
        FROM    [20090616_exists].t_inner
        )

val2
95859966
98637292
45432525
83285402
2021066
83569592
6316854
19816746
62048056
29721475
34270807
98103178
16758862
78 rows fetched in 0.0013s (0.1054s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 110 ms,  elapsed time = 105 ms. 
SELECT
  Hash Match (Left Semi Join)
    Clustered Index Scan (t_outer)
    Clustered Index Scan (t_inner)

This execution plan uses Hash Match (Left Semi Join).

A hash table is built over the values of t_outer and each row from t_inner is probed against this hash table.

Since it's a Left Semi Join, the matched values are immediately removed from the hash table so that it shrinks gradually while the query is being executed.

Needless to say the EXISTS uses the same plan:

SELECT  val2
FROM    [20090616_exists].t_outer o
WHERE   EXISTS (
        SELECT  1
        FROM    [20090616_exists].t_inner s
        WHERE   s.val2 = o.val2
        )

val2
95859966
98637292
45432525
83285402
2021066
83569592
6316854
19816746
62048056
29721475
34270807
98103178
16758862
78 rows fetched in 0.0013s (0.1062s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 106 ms. 
SELECT
  Hash Match (Left Semi Join)
    Clustered Index Scan (t_outer)
    Clustered Index Scan (t_inner)

The plan for the JOIN, however, is a little bit different:

SELECT  o.val2
FROM    [20090616_exists].t_outer o
JOIN    (
        SELECT  DISTINCT val2
        FROM    [20090616_exists].t_inner
        ) i
ON      i.val2 = o.val2

val2
88456016
43161221
93634582
42877370
83285402
9055631
96128333
98103178
62522213
19816746
82512513
16758862
45432525
78 rows fetched in 0.0013s (0.1455s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_inner'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 141 ms,  elapsed time = 145 ms. 
SELECT
  Hash Match (Inner Join)
    Hash Match (Aggregate)
      Clustered Index Scan (t_inner)
    Clustered Index Scan (t_outer)

In this case, SQL Server cannot use the Left Semi Join method.

Instead, it first removes all duplicates of val2 from t_inner using Hash Match (Aggregate). This method build a UNIQUE hash table: duplicate values are easily detectable (since they always cause hash collisions) and they just don't get into the table while the latter is being built.

Then the engine probes t_outer against the hash table and returns the matching records.

For some reason it's a little less efficient than IN / EXISTS and it's hard to say why exactly.

Most probably it's because the engine doesn't reuse the hash table built on Hash Match (Aggregate) stage and builds a new table instead.

Finally, let's check what's going on when searching against a small table:

SELECT  val1
FROM    [20090616_exists].t_outer o
WHERE   val1 IN
        (
        SELECT  val1
        FROM    [20090616_exists].t_smallinner
        )

val1
2743837
3142720
3154309
3679309
4533134
5061895
7677846
8205854
9055748
9447811
94201341
96363403
98069188
100 rows fetched in 0.0018s (0.0015s)
Table 't_outer'. Scan count 100, logical reads 230, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_smallinner'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms. 
SELECT
  Nested Loops (Inner Join)
    Stream Aggregate
      Index Scan (t_smallinner)
    Index Seek (t_outer)

The plan is different, since a Merge Join would be an overkill here: it requires traversing the whole index.

Instead, the engine just scans all values from t_smallinner, filters the duplicates out, and just searches the index over t_outer.val1 for the matched values using an Index Seek.

Since there are only 100 values in t_smallinner, it's much faster than traversing the whole index with Merge Join (1 ms, which is within the measurement error range).

The same plan is used for EXISTS and JOIN.

If we search on non-indexed columns:

SELECT  val2
FROM    [20090616_exists].t_outer o
WHERE   val2 IN
        (
        SELECT  val2
        FROM    [20090616_exists].t_smallinner
        )

val2
89853723
20937415
30897128
55955805
32114860
84317984
89322303
74105831
55365972
15316888
1434201
71806116
83206959
101 rows fetched in 0.0020s (0.0251s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_smallinner'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 25 ms. 
SELECT
  Hash Match (Right Semi Join)
    Clustered Index Scan (t_smalinner)
    Clustered Index Scan (t_outer)

This is quite the same as for the big table, with a little difference: the hash table is built on the subquery table, not the external one.

It does not shrink during the execution: instead, it is shrunk (cleared of the duplicates) while being built.

It's in fact a Hash Match (Aggregate) combined with Hash Match (Inner Join), but in more efficient way which reuses the hash table.

And here is a JOIN query:

SELECT  o.val2
FROM    [20090616_exists].t_outer o
JOIN    (
        SELECT  DISTINCT val2
        FROM    [20090616_exists].t_smallinner
        ) i
ON      i.val2 = o.val2

val2
89853723
20937415
30897128
55955805
32114860
84317984
89322303
74105831
55365972
15316888
1434201
71806116
83206959
101 rows fetched in 0.0019s (0.0244s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 1, logical reads 262, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_smallinner'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 32 ms,  elapsed time = 24 ms. 
SELECT
  Hash Match (Inner Join)
    Distinct Sort
      Clustered Index Scan (t_smalinner)
    Clustered Index Scan (t_outer)

This is also less efficient, because it uses Distinct Sort to remove the duplicates and since it does not produce a hash table, it should rebuild it again.

We see now that contrary to the popular opinion, IN / EXISTS queries are not less efficient than a JOIN query in SQL Server.

In fact, JOIN queries are less efficient on non-indexed tables, since Semi Join methods allow aggregation and matching against a single hash table, while a JOIN needs to do these two operations in two steps.

Finally, I'd like to debunk one more myth.

Many think that EXISTS is more efficient that IN, because EXISTS returns only one row.

This is not true for SQL Server. As we can see from the examples above, EXISTS and IN produces exactly same plans.

This is because EXISTS is more flexible than IN. An IN can always be rewritten as EXISTS (using a simple WHERE condition with an equijoin) but not vice versa.

In fact, a poorly written EXISTS could indeed lead to the query being reevaluated for each row. That's why so many developers are reluctant to use IN, and, ironically, it's EXISTS that is prone to it.

This query, for instance, is very inefficient:

SELECT  val2
FROM    [20090616_exists].t_outer o
WHERE   EXISTS (
        SELECT  1
        FROM    [20090616_exists].t_smallinner s
        WHERE   s.val2 - o.val2 = 0
        )

val2
89853723
20937415
30897128
55955805
32114860
84317984
89322303
74105831
55365972
15316888
1434201
71806116
83206959
101 rows fetched in 0.0019s (1.3438s)
Table 't_smallinner'. Scan count 2, logical reads 200002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_outer'. Scan count 3, logical reads 781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 2718 ms,  elapsed time = 1353 ms. 

Since this condition is not sargable and not even hashable, this query uses Nested Loops (Left Semi Join) over full table scans and runs for almost 1.5 seconds.

Written by Quassnoi

June 16th, 2009 at 11:00 pm

Posted in SQL Server

6 Responses to 'IN vs. JOIN vs. EXISTS'

Subscribe to comments with RSS

  1. Thanks for this. Saved me from rewriting a bunch of views

    Luke

    24 Oct 12 at 06:45

  2. Thank you so much !
    Nice post !
    I wonder are there any differences when using MySQL or PostgreSQL ?

    Coding Tip

    6 Aug 13 at 10:34

  3. Thanks

    Lorenzo

    2 Apr 14 at 19:10

  4. Thanks a lot, we needed some of this to improve our queries.

    Lorenzo

    2 Apr 14 at 19:12

  5. Thanks a lot. Nice information.

    Jes Keshly Cruz

    6 Jun 14 at 13:03

  6. perfect explanation! thanks a lot!

    Fatih Yarbaşı

    6 Sep 14 at 15:46

Leave a Reply