Archive for July 16th, 2009
INNER JOIN vs. CROSS APPLY
From Stack Overflow:
Can anyone give me a good example of when
CROSS APPLYmakes a difference in those cases whereINNER JOINwill work as well?
This is of course SQL Server.
A quick reminder on the terms.
INNER JOIN is the most used construct in SQL: it joins two tables together, selecting only those row combinations for which a JOIN condition is true.
This query:
SELECT * FROM table1 JOIN table2 ON table2.b = table1.a
reads:
For each row from
table1, select all rows fromtable2where the value of fieldbis equal to that of fielda
Note that this condition can be rewritten as this:
SELECT * FROM table1, table2 WHERE table2.b = table1.a
, in which case it reads as following:
Make a set of all possible combinations of rows from
table1andtable2and of this set select only the rows where the value of fieldbis equal to that of fielda
These conditions are worded differently, but they yield the same result and database systems are aware of that. Usually both these queries are optimized to use the same execution plan.
The former syntax is called ANSI syntax, and it is generally considered more readable and is recommended to use.
However, it didn't make it into Oracle until recently, that's why there are many hardcore Oracle developers that are just used to the latter syntax.
Actually, it's a matter of taste.
To use JOINs (with whatever syntax), both sets you are joining must be self-sufficient, i. e. the sets should not depend on each other. You can query both sets without ever knowing the contents on another set.
But for some tasks the sets are not self-sufficient. For instance, let's consider the following query:
We have
table1andtable2.table1has a column calledrowcount.For each row from
table1we need to select firstrowcountrows fromtable2, ordered bytable2.id
We cannot come up with a join condition here. The join condition, should it exist, would involve the row number, which is not present in table2, and there is no way to calculate a row number only from the values of columns of any given row in table2.
That's where the CROSS APPLY can be used.
CROSS APPLY is a Microsoft's extension to SQL, which was originally intended to be used with table-valued functions (TVF's).
The query above would look like this:
SELECT * FROM table1 CROSS APPLY ( SELECT TOP (table1.rowcount) * FROM table2 ORDER BY id ) t2
For each from
table1, select firsttable1.rowcountrows fromtable2ordered byid
The sets here are not self-sufficient: the query uses values from table1 to define the second set, not to JOIN with it.
The exact contents of t2 are not known until the corresponding row from table1 is selected.
I previously said that there is no way to join these two sets, which is true as long as we consider the sets as is. However, we can change the second set a little so that we get an additional computed field we can later join on.
The first option to do that is just count all preceding rows in a subquery:
SELECT *
FROM table1 t1
JOIN (
SELECT t2o.*,
(
SELECT COUNT(*)
FROM table2 t2i
WHERE t2i.id <= t2o.id
) AS rn
FROM table2 t2o
) t2
ON t2.rn <= t1.rowcount
The second option is to use a window function, also available in SQL Server since version 2005:
SELECT *
FROM table1 t1
JOIN (
SELECT t2o.*, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM table2 t2o
) t2
ON t2.rn <= t1.rowcount
This function returns the ordinal number a row would have, be the ORDER BY condition used in the function applied to the whole query.
This is essentially the same result as the subquery used in the previous query.
Now, let's create the sample tables and check all these solutions for efficiency:
Subscribe in a reader