Fallback language names: SQL Server at EXPLAIN EXTENDED

Fallback language names: SQL Server

Comments enabled. I *really* need your comment

This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:

From Stack Overflow:

I have table item and another table language which contains names for the items in different languages:

item language data

How do I select a French name for an item if it exists, or a fallback English name if there is no French one?

We basically have three options here:

Use COALESCE on two SELECT list subqueries
Use COALESCE on the results of two LEFT JOINS
Use the combination of methods above: a LEFT JOIN for French names and a subquery for English ones

Yesterday, we saw that the second option (two LEFT JOINs) is best in Oracle if the probability of the fallback is high (i. e. the localization is poor and but few terms are translated into the local language). I took Latin language as an example for this.

Third query (a LEFT JOIN with the local language rowset and a subquery in a COALESCE to retrive the fallback language names) is best if there is a low probability of a fallback, i. e. almost all terms are translated. In this case, I took French as an example, since it is widely used and localizations are likely to cover most terms.

Today we will see how same queries work in SQL Server.

Let's create sample tables:

Table creation details

CREATE SCHEMA [20090807_fallback]
CREATE TABLE t_item (id INT NOT NULL PRIMARY KEY, stuffing VARCHAR(100) NOT NULL)
CREATE TABLE t_language (item INT NOT NULL, language CHAR(3) NOT NULL, name VARCHAR(50) NOT NULL)
GO
ALTER TABLE [20090807_fallback].t_language ADD CONSTRAINT pk_language_language_item PRIMARY KEY (language, item)
GO
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 1000000
BEGIN
        INSERT
        INTO    &#91;20090807_fallback&#93;.t_item
        SELECT  @cnt, RIGHT('Item ' + CAST(@cnt AS VARCHAR) + SPACE(100), 100)
        SET @cnt = @cnt + 1
END
INSERT
INTO    &#91;20090807_fallback&#93;.t_language
SELECT  id, CAST(lcode AS CHAR(3)), lname + ' ' + CAST(id AS VARCHAR)
FROM    (
        SELECT  'en' AS lcode, 'English' AS lname
        UNION ALL
        SELECT  'fr' AS lcode, 'French' AS lname
        UNION ALL
        SELECT  'la' AS lcode, 'Latin' AS lname
        ) l
CROSS JOIN
        &#91;20090807_fallback&#93;.t_item
WHERE   NOT (id % 1000 = 1 AND lcode = 'fr')
        AND NOT (id % 500 <> 1 AND lcode = 'la')
COMMIT

t_item contains 1,000,000 items.

t_language contains 1,000,000 English names, 999,000 French names and 2,000 Latin names.

French is an example of low fallback probability, Latin is the one of high fallback probability.

Two subqueries

Click on the link below to expand the queries

French

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE
                (
                (
                SELECT  name
                FROM    [20090807_fallback].t_language ln
                WHERE   item = id
                        AND language = 'fr'
                ),
                (
                SELECT  name
                FROM    [20090807_fallback].t_language ln
                WHERE   item = id
                        AND language = 'en'
                )
                ) AS cname
        FROM    [20090807_fallback].t_item
        ) q


12889896
1 row fetched in 0.0002s (9.0623s)

Table 't_language'. Scan count 1, logical reads 3005822, physical reads 457, read-ahead reads 3181, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 11035, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 6672 ms,  elapsed time = 9065 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Nested Loops (Left Outer Join)
        Nested Loops (Left Outer Join)
          Merge Join (Left Outer Join)
            Clustered Index Scan [test].[20090807_fallback].[t_item]
            Clustered Index Seek [test].[20090807_fallback].[t_language]
        Clustered Index Seek [test].[20090807_fallback].[t_language]
      Clustered Index Seek [test].[20090807_fallback].[t_language]

Latin

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE
                (
                (
                SELECT  name
                FROM    [20090807_fallback].t_language ln
                WHERE   item = id
                        AND language = 'la'
                ),
                (
                SELECT  name
                FROM    [20090807_fallback].t_language ln
                WHERE   item = id
                        AND language = 'en'
                )
                ) AS cname
        FROM    [20090807_fallback].t_item
        ) q


13884896
1 row fetched in 0.0002s (2.7656s)

Table 't_language'. Scan count 3, logical reads 3000018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 3, logical reads 18059, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 5390 ms,  elapsed time = 2767 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Parallelism (Gather Streams)
        Stream Aggregate
          Nested Loops (Left Outer)
            Nested Loops (Left Outer)
              Hash Match (Right Outer Join)
                Parallelism (Repartition Streams)
                  Clustered Index Seek [test].[20090807_fallback].[t_language]
                Parallelism (Repartition Streams)
                  Clustered Index Scan [test].[20090807_fallback].[t_item]
            Clustered Index Seek [test].[20090807_fallback].[t_language]
          Clustered Index Seek [test].[20090807_fallback].[t_language]

We see that the Latin and French queries yield completely different plans, and the Latin query is much more efficient (2.6 seconds for Latin, 9.0 seconds for French)

This is because SQL Server was able to use more efficient Hash Match for producing the resultset with the Latin subquery.

However, the English subquery is alwasy build on demand, that's why the Nested Loops are the only choise.

Two LEFT JOINs

Click on the link below to expand the queries

French

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (ln.name, len.name) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'fr'
        LEFT JOIN
                [20090807_fallback].t_language len
        ON      len.item = i.id
                AND len.language = 'en'
        ) q


12889896
1 row fetched in 0.0002s (1.7499s)

Table 't_language'. Scan count 2, logical reads 11789, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 1735 ms,  elapsed time = 1743 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Merge Join (Left Outer Join)
        Merge Join (Left Outer Join)
          Clustered Index Scan [test].[20090807_fallback].[t_item]
          Clustered Index Seek [test].[20090807_fallback].[t_language]
        Clustered Index Seek [test].[20090807_fallback].[t_language]

Latin

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (ln.name, len.name) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'la'
        LEFT JOIN
                [20090807_fallback].t_language len
        ON      len.item = i.id
                AND len.language = 'en'
        ) q


13884896
1 row fetched in 0.0002s (1.2031s)

Table 't_language'. Scan count 2, logical reads 5983, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 1219 ms,  elapsed time = 1216 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Merge Join (Left Outer Join)
        Merge Join (Left Outer Join)
          Clustered Index Scan [test].[20090807_fallback].[t_item]
          Clustered Index Seek [test].[20090807_fallback].[t_language]
        Clustered Index Seek [test].[20090807_fallback].[t_language]

Both these queries produce exactly same plan, namely using two Merge Joins for both tables.

The Latin one is more efficient since there are fewer values to join here.

One subquery and one join

Click on the link below to expand the queries

French

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (
                ln.name,
                (
                SELECT  name
                FROM    [20090807_fallback].t_language len
                WHERE   len.item = i.id
                        AND len.language = 'en'

                )
                ) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'fr'
        ) q


12889896
1 row fetched in 0.0002s (1.6718s)

Table 't_language'. Scan count 1, logical reads 8822, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 1, logical reads 16456, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 1671 ms,  elapsed time = 1672 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Nested Loops (Left Outer Join)
        Merge Join (Left Outer Join)
          Clustered Index Scan [test].[20090807_fallback].[t_item]
          Clustered Index Seek [test].[20090807_fallback].[t_language]
        Clustered Index Seek [test].[20090807_fallback].[t_language]

Latin

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (
                ln.name,
                (
                SELECT  name
                FROM    [20090807_fallback].t_language len
                WHERE   len.item = i.id
                        AND len.language = 'en'

                )
                ) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'la'
        ) q


13884896
1 row fetched in 0.0002s (2.6562s)

Table 't_language'. Scan count 3, logical reads 2994018, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_item'. Scan count 3, logical reads 18059, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 5281 ms,  elapsed time = 2659 ms.

SELECT
  Compute Scalar
    Stream Aggregate
      Parallelism (Gather Streams)
        Stream Aggregate
          Nested Loops (Left Outer)
            Hash Match (Right Outer Join)
              Parallelism (Repartition Streams)
                Clustered Index Seek [test].[20090807_fallback].[t_language]
              Parallelism (Repartition Streams)
                Clustered Index Scan [test].[20090807_fallback].[t_item]
            Clustered Index Seek [test].[20090807_fallback].[t_language]

The different languages generate different execution plans, but in this case the French is much more efficient than Latin.

The reason is the same as in Oracle: the fallback subquery is executed much more often for the missing Latin names (which are numerious) than for missing French names (that are rare).

Summary

Like in Oracle, in SQL Server there are two different efficient methods to select fallback values:

If the probability of fallback is high, i. e. there are few main values, it's more efficient to select both the main value and the fallback value in a LEFT JOIN, like this:

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (ln.name, len.name) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'la'
        LEFT JOIN
                [20090807_fallback].t_language len
        ON      len.item = i.id
                AND len.language = 'en'
        ) q

If the probability of fallback is low, i. e. there are many main values, it's more efficient to select the main value in a LEFT JOIN, and the fallback value in a subquery, which is a second argument to COALESCE, like this:

SELECT  SUM(LEN(cname))
FROM    (
        SELECT  COALESCE (
                ln.name,
                (
                SELECT  name
                FROM    [20090807_fallback].t_language len
                WHERE   len.item = i.id
                        AND len.language = 'en'

                )
                ) AS cname
        FROM    [20090807_fallback].t_item i
        LEFT JOIN
                [20090807_fallback].t_language ln
        ON      ln.item = i.id
                AND ln.language = 'fr'
        ) q

SQL Server always rewrites the join with the main rowset in most efficient way. However, the second argument to COALESCE is evaluated only if the first one is NULL. The subquery will be evaluated only in the rare cases when the main value will fail to retrieve.

To be continued.

Written by Quassnoi

August 7th, 2009 at 11:00 pm

Posted in SQL Server

EXPLAIN EXTENDED