This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
From Stack Overflow:
I have table item
and another table language
which contains names for the items
in different languages:
How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
We basically have three options here:
- Use
COALESCE
on two SELECT
list subqueries
- Use
COALESCE
on the results of two LEFT JOINS
- Use the combination of methods above: a
LEFT JOIN
for French names and a subquery for English ones
Yesterday, we saw that the second option (two LEFT JOIN
s) is best in Oracle if the probability of the fallback is high (i. e. the localization is poor and but few terms are translated into the local language). I took Latin language as an example for this.
Third query (a LEFT JOIN
with the local language rowset and a subquery in a COALESCE
to retrive the fallback language names) is best if there is a low probability of a fallback, i. e. almost all terms are translated. In this case, I took French as an example, since it is widely used and localizations are likely to cover most terms.
Today we will see how same queries work in SQL Server.
Let's create sample tables:
Table creation details
CREATE SCHEMA [20090807_fallback]
CREATE TABLE t_item (id INT NOT NULL PRIMARY KEY, stuffing VARCHAR(100) NOT NULL)
CREATE TABLE t_language (item INT NOT NULL, language CHAR(3) NOT NULL, name VARCHAR(50) NOT NULL)
GO
ALTER TABLE [20090807_fallback].t_language ADD CONSTRAINT pk_language_language_item PRIMARY KEY (language, item)
GO
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 1000000
BEGIN
INSERT
INTO [20090807_fallback].t_item
SELECT @cnt, RIGHT('Item ' + CAST(@cnt AS VARCHAR) + SPACE(100), 100)
SET @cnt = @cnt + 1
END
INSERT
INTO [20090807_fallback].t_language
SELECT id, CAST(lcode AS CHAR(3)), lname + ' ' + CAST(id AS VARCHAR)
FROM (
SELECT 'en' AS lcode, 'English' AS lname
UNION ALL
SELECT 'fr' AS lcode, 'French' AS lname
UNION ALL
SELECT 'la' AS lcode, 'Latin' AS lname
) l
CROSS JOIN
[20090807_fallback].t_item
WHERE NOT (id % 1000 = 1 AND lcode = 'fr')
AND NOT (id % 500 <> 1 AND lcode = 'la')
COMMIT
t_item
contains 1,000,000 items.
t_language
contains 1,000,000 English names, 999,000 French names and 2,000 Latin names.
French is an example of low fallback probability, Latin is the one of high fallback probability.
Two subqueries
Click on the link below to expand the queries
French
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE
(
(
SELECT name
FROM [20090807_fallback].t_language ln
WHERE item = id
AND language = 'fr'
),
(
SELECT name
FROM [20090807_fallback].t_language ln
WHERE item = id
AND language = 'en'
)
) AS cname
FROM [20090807_fallback].t_item
) q
|
12889896 |
1 row fetched in 0.0002s (9.0623s) |
Table 't_language'. Scan count 1, logical reads 3005822, physical reads 457, read-ahead reads 3181, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 11035, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6672 ms, elapsed time = 9065 ms.
SELECT
Compute Scalar
Stream Aggregate
Nested Loops (Left Outer Join)
Nested Loops (Left Outer Join)
Merge Join (Left Outer Join)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Latin
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE
(
(
SELECT name
FROM [20090807_fallback].t_language ln
WHERE item = id
AND language = 'la'
),
(
SELECT name
FROM [20090807_fallback].t_language ln
WHERE item = id
AND language = 'en'
)
) AS cname
FROM [20090807_fallback].t_item
) q
|
13884896 |
1 row fetched in 0.0002s (2.7656s) |
Table 't_language'. Scan count 3, logical reads 3000018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 3, logical reads 18059, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5390 ms, elapsed time = 2767 ms.
SELECT
Compute Scalar
Stream Aggregate
Parallelism (Gather Streams)
Stream Aggregate
Nested Loops (Left Outer)
Nested Loops (Left Outer)
Hash Match (Right Outer Join)
Parallelism (Repartition Streams)
Clustered Index Seek [test].[20090807_fallback].[t_language]
Parallelism (Repartition Streams)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
We see that the Latin and French queries yield completely different plans, and the Latin query is much more efficient (2.6 seconds for Latin, 9.0 seconds for French)
This is because SQL Server was able to use more efficient Hash Match
for producing the resultset with the Latin subquery.
However, the English subquery is alwasy build on demand, that's why the Nested Loops
are the only choise.
Two LEFT JOINs
Click on the link below to expand the queries
French
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (ln.name, len.name) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'fr'
LEFT JOIN
[20090807_fallback].t_language len
ON len.item = i.id
AND len.language = 'en'
) q
|
12889896 |
1 row fetched in 0.0002s (1.7499s) |
Table 't_language'. Scan count 2, logical reads 11789, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1735 ms, elapsed time = 1743 ms.
SELECT
Compute Scalar
Stream Aggregate
Merge Join (Left Outer Join)
Merge Join (Left Outer Join)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Latin
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (ln.name, len.name) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'la'
LEFT JOIN
[20090807_fallback].t_language len
ON len.item = i.id
AND len.language = 'en'
) q
|
13884896 |
1 row fetched in 0.0002s (1.2031s) |
Table 't_language'. Scan count 2, logical reads 5983, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1219 ms, elapsed time = 1216 ms.
SELECT
Compute Scalar
Stream Aggregate
Merge Join (Left Outer Join)
Merge Join (Left Outer Join)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Both these queries produce exactly same plan, namely using two Merge Joins
for both tables.
The Latin one is more efficient since there are fewer values to join here.
One subquery and one join
Click on the link below to expand the queries
French
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (
ln.name,
(
SELECT name
FROM [20090807_fallback].t_language len
WHERE len.item = i.id
AND len.language = 'en'
)
) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'fr'
) q
|
12889896 |
1 row fetched in 0.0002s (1.6718s) |
Table 't_language'. Scan count 1, logical reads 8822, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1671 ms, elapsed time = 1672 ms.
SELECT
Compute Scalar
Stream Aggregate
Nested Loops (Left Outer Join)
Merge Join (Left Outer Join)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Clustered Index Seek [test].[20090807_fallback].[t_language]
Latin
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (
ln.name,
(
SELECT name
FROM [20090807_fallback].t_language len
WHERE len.item = i.id
AND len.language = 'en'
)
) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'la'
) q
|
13884896 |
1 row fetched in 0.0002s (2.6562s) |
Table 't_language'. Scan count 3, logical reads 2994018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_item'. Scan count 3, logical reads 18059, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5281 ms, elapsed time = 2659 ms.
SELECT
Compute Scalar
Stream Aggregate
Parallelism (Gather Streams)
Stream Aggregate
Nested Loops (Left Outer)
Hash Match (Right Outer Join)
Parallelism (Repartition Streams)
Clustered Index Seek [test].[20090807_fallback].[t_language]
Parallelism (Repartition Streams)
Clustered Index Scan [test].[20090807_fallback].[t_item]
Clustered Index Seek [test].[20090807_fallback].[t_language]
The different languages generate different execution plans, but in this case the French is much more efficient than Latin.
The reason is the same as in Oracle: the fallback subquery is executed much more often for the missing Latin names (which are numerious) than for missing French names (that are rare).
Summary
Like in Oracle, in SQL Server there are two different efficient methods to select fallback values:
- If the probability of fallback is high, i. e. there are few main values, it's more efficient to select both the main value and the fallback value in a
LEFT JOIN
, like this:
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (ln.name, len.name) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'la'
LEFT JOIN
[20090807_fallback].t_language len
ON len.item = i.id
AND len.language = 'en'
) q
-
If the probability of fallback is low, i. e. there are many main values, it's more efficient to select the main value in a LEFT JOIN
, and the fallback value in a subquery, which is a second argument to COALESCE
, like this:
SELECT SUM(LEN(cname))
FROM (
SELECT COALESCE (
ln.name,
(
SELECT name
FROM [20090807_fallback].t_language len
WHERE len.item = i.id
AND len.language = 'en'
)
) AS cname
FROM [20090807_fallback].t_item i
LEFT JOIN
[20090807_fallback].t_language ln
ON ln.item = i.id
AND ln.language = 'fr'
) q
SQL Server always rewrites the join with the main rowset in most efficient way. However, the second argument to COALESCE
is evaluated only if the first one is NULL
. The subquery will be evaluated only in the rare cases when the main value will fail to retrieve.
To be continued.