Fallback language names: Oracle
Comments enabled. I *really* need your comment
This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
- Fallback language names: Oracle
- Fallback language names: SQL Server
- Fallback language names: PostgreSQL
- Fallback language names: MySQL
From Stack Overflow:
I have table
itemand another tablelanguagewhich contains names for theitemsin different languages:
item language data How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
This is quite a common database design (most often used for localization), so I'd like to cover this question for different databases.
This design is not limited to languages and localization: it may be used for any data with default value that can be overriden. Prices, discounts, options — many kinds of things.
Today we will see how do it in Oracle.
We basically have three options here:
- Use
COALESCEon twoSELECTlist subqueries - Use
COALESCEon the results of twoLEFT JOINS - Use the combination of methods above: a
LEFT JOINfor French names and a subquery for English ones
Let's create sample tables and see what is more efficient.
CREATE TABLE t_item (id NOT NULL, stuffing NOT NULL)
AS
SELECT level AS id, RPAD('Item ' || level, 100, ' ') AS stuffing
FROM dual
CONNECT BY
level <= 1000000
/
ALTER TABLE t_item ADD CONSTRAINT pk_item_id PRIMARY KEY (id)
/
CREATE TABLE t_language (item NOT NULL, language NOT NULL, name NOT NULL)
AS
SELECT id, CAST(lcode AS CHAR(3)), CAST(lname || ' ' || id AS VARCHAR(50))
FROM (
SELECT 'en' AS lcode, 'English' AS lname
FROM dual
UNION ALL
SELECT 'fr' AS lcode, 'French' AS lname
FROM dual
UNION ALL
SELECT 'la' AS lcode, 'Latin' AS lname
FROM dual
) language
CROSS JOIN
t_item
WHERE NOT (MOD(id, 1000) = 1 AND lcode = 'fr')
AND NOT(MOD(id, 500) <> 1 AND lcode = 'la')
/
ALTER TABLE t_language ADD CONSTRAINT pk_language_language_item PRIMARY KEY (language, item)
/
COMMIT
/
t_item contains 1,000,000 items.
t_language contains 1,000,000 English names, 999,000 French names and 2,000 Latin names.
This is to emulate different scenarios: fallback probability is low for the French language name but high for the Latin ones.
To measure performance, we will select the SUM of names lengths.
Two subqueries
A French one:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE
(
(
SELECT name
FROM t_language ln
WHERE item = id
AND language = 'fr'
),
(
SELECT name
FROM t_language ln
WHERE item = id
AND language = 'en'
)
) AS cname
FROM t_item
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (7.4999s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE SORT AGGREGATE INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
A Latin one:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE
(
(
SELECT name
FROM t_language ln
WHERE item = id
AND language = 'la'
),
(
SELECT name
FROM t_language ln
WHERE item = id
AND language = 'en'
)
) AS cname
FROM t_item
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (9.8279s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE SORT AGGREGATE INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
We see that these queries use the same plan, however Latin query is noticeably slower than the French one.
This is because COALESCE does not evaluate the second subquery if the first one succeds. For the French language, the first subquery succeeds almost always; for Latin — almost never.
That's why both subqueries are more likely to be executed when querying for Latin and the Latin query takes 3 seconds more to complete.
Two LEFT JOINs
Here's one for the French:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE (ln.name, len.name) AS cname
FROM t_item i
LEFT JOIN
t_language ln
ON ln.item = i.id
AND ln.language = 'fr'
LEFT JOIN
t_language len
ON len.item = i.id
AND len.language = 'en'
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (6.2499s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
SORT AGGREGATE
HASH JOIN RIGHT OUTER
TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
HASH JOIN OUTER
INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
, and the one for the Latin:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE (ln.name, len.name) AS cname
FROM t_item i
LEFT JOIN
t_language ln
ON ln.item = i.id
AND ln.language = 'la'
LEFT JOIN
t_language len
ON len.item = i.id
AND len.language = 'en'
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0002s (4.6093s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
SORT AGGREGATE
HASH JOIN RIGHT OUTER
TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
HASH JOIN OUTER
INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE
INDEX RANGE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM
This is much faster, since more efficient HASH JOIN is used to jone the tables.
The Latin query is faster than the French one in this case (4.6 seconds vs 6.3 seconds), because the Latin hash table is much shorter than the French one (there are fewer values to select and to hash) and can be searched faster.
One subquery and one join
This solution treats the main language and the fallback language differently.
It joins the main language table and selects the fallback language in a subquery which is a second argument to COALESCE.
This is for the French language:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE (
ln.name,
(
SELECT name
FROM t_language len
WHERE len.item = i.id
AND len.language = 'en'
)
) AS cname
FROM t_item i
LEFT JOIN
t_language ln
ON ln.item = i.id
AND ln.language = 'fr'
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (2.1250s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM SORT AGGREGATE HASH JOIN OUTER INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
And here is the one for the Latin:
SELECT SUM(LENGTH(cname))
FROM (
SELECT COALESCE (
ln.name,
(
SELECT name
FROM t_language len
WHERE len.item = i.id
AND len.language = 'en'
)
) AS cname
FROM t_item i
LEFT JOIN
t_language ln
ON ln.item = i.id
AND ln.language = 'la'
)
| SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (12.2029s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE
INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM
SORT AGGREGATE
HASH JOIN OUTER
INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE
INDEX RANGE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM
We can see that in this case the French query is extremely fast (2 seconds), while the Latin one is extremely slow (12 seconds).
This is because there probability to find a French name is very high.
The LEFT JOIN finds every French name it can find in an efficient way, and for those few cases when there is no French name, an English name is selected in a subquery.
Same is true for the Latin, except that the subquery will be run for almost every name. Since the SELECT clause subqueries (i. e. nested loops) are much less efficient than HASH JOINs when lots of rows are selected, this query is less efficient than a double LEFT JOIN.
Summary
In Oracle, there are two efficient methods to select fallback values:
- If the probability of fallback is high, i. e. there are few main values, it's more efficient to select both the main value and the fallback value in a
LEFT JOIN, like this:SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE (ln.name, len.name) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'la' LEFT JOIN t_language len ON len.item = i.id AND len.language = 'en' ) -
If the probability of fallback is low, i. e. there are many main values, it's more efficient to select the main value in a
LEFT JOIN, and the fallback value in a subquery, which is a second argument toCOALESCE, like this:SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ln.name, ( SELECT name FROM t_language len WHERE len.item = i.id AND len.language = 'en' ) ) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'fr' )In Oracle, the second argument to
COALESCEis evaluated only if the first one isNULL. The subquery will be evaluated only in the rare cases when the main value will fail to retrieve.
To be continued.
Subscribe in a reader