Fallback language names: Oracle
Comments enabled. I *really* need your comment
This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
- Fallback language names: Oracle
- Fallback language names: SQL Server
- Fallback language names: PostgreSQL
- Fallback language names: MySQL
From Stack Overflow:
I have table
item
and another tablelanguage
which contains names for theitems
in different languages:
item language data How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
This is quite a common database design (most often used for localization), so I'd like to cover this question for different databases.
This design is not limited to languages and localization: it may be used for any data with default value that can be overriden. Prices, discounts, options — many kinds of things.
Today we will see how do it in Oracle.
We basically have three options here:
- Use
COALESCE
on twoSELECT
list subqueries - Use
COALESCE
on the results of twoLEFT JOINS
- Use the combination of methods above: a
LEFT JOIN
for French names and a subquery for English ones
Let's create sample tables and see what is more efficient.
CREATE TABLE t_item (id NOT NULL, stuffing NOT NULL) AS SELECT level AS id, RPAD('Item ' || level, 100, ' ') AS stuffing FROM dual CONNECT BY level <= 1000000 / ALTER TABLE t_item ADD CONSTRAINT pk_item_id PRIMARY KEY (id) / CREATE TABLE t_language (item NOT NULL, language NOT NULL, name NOT NULL) AS SELECT id, CAST(lcode AS CHAR(3)), CAST(lname || ' ' || id AS VARCHAR(50)) FROM ( SELECT 'en' AS lcode, 'English' AS lname FROM dual UNION ALL SELECT 'fr' AS lcode, 'French' AS lname FROM dual UNION ALL SELECT 'la' AS lcode, 'Latin' AS lname FROM dual ) language CROSS JOIN t_item WHERE NOT (MOD(id, 1000) = 1 AND lcode = 'fr') AND NOT(MOD(id, 500) <> 1 AND lcode = 'la') / ALTER TABLE t_language ADD CONSTRAINT pk_language_language_item PRIMARY KEY (language, item) / COMMIT /
t_item
contains 1,000,000 items.
t_language
contains 1,000,000
English names, 999,000 French names and 2,000 Latin names.
This is to emulate different scenarios: fallback probability is low for the French language name but high for the Latin ones.
To measure performance, we will select the SUM
of names lengths.
Two subqueries
A French one:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ( SELECT name FROM t_language ln WHERE item = id AND language = 'fr' ), ( SELECT name FROM t_language ln WHERE item = id AND language = 'en' ) ) AS cname FROM t_item )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (7.4999s) |
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE SORT AGGREGATE INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
A Latin one:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ( SELECT name FROM t_language ln WHERE item = id AND language = 'la' ), ( SELECT name FROM t_language ln WHERE item = id AND language = 'en' ) ) AS cname FROM t_item )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (9.8279s) |
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_ITEM_LANGUAGE SORT AGGREGATE INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID
We see that these queries use the same plan, however Latin query is noticeably slower than the French one.
This is because COALESCE
does not evaluate the second subquery if the first one succeds. For the French language, the first subquery succeeds almost always; for Latin — almost never.
That's why both subqueries are more likely to be executed when querying for Latin and the Latin query takes 3 seconds more to complete.
Two LEFT JOIN
s
Here's one for the French:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE (ln.name, len.name) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'fr' LEFT JOIN t_language len ON len.item = i.id AND len.language = 'en' )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (6.2499s) |
SELECT STATEMENT SORT AGGREGATE HASH JOIN RIGHT OUTER TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE HASH JOIN OUTER INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
, and the one for the Latin:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE (ln.name, len.name) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'la' LEFT JOIN t_language len ON len.item = i.id AND len.language = 'en' )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0002s (4.6093s) |
SELECT STATEMENT SORT AGGREGATE HASH JOIN RIGHT OUTER TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE HASH JOIN OUTER INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX RANGE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM
This is much faster, since more efficient HASH JOIN
is used to jone the tables.
The Latin query is faster than the French one in this case (4.6 seconds vs 6.3 seconds), because the Latin hash table is much shorter than the French one (there are fewer values to select and to hash) and can be searched faster.
One subquery and one join
This solution treats the main language and the fallback language differently.
It joins the main language table and selects the fallback language in a subquery which is a second argument to COALESCE
.
This is for the French language:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ln.name, ( SELECT name FROM t_language len WHERE len.item = i.id AND len.language = 'en' ) ) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'fr' )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12889896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (2.1250s) |
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM SORT AGGREGATE HASH JOIN OUTER INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID TABLE ACCESS FULL, 20090806_fallback.T_LANGUAGE
And here is the one for the Latin:
SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ln.name, ( SELECT name FROM t_language len WHERE len.item = i.id AND len.language = 'en' ) ) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'la' )
SUM(LENGTH(CNAME)) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13884896 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (12.2029s) |
SELECT STATEMENT TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX UNIQUE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM SORT AGGREGATE HASH JOIN OUTER INDEX FAST FULL SCAN, 20090806_fallback.PK_ITEM_ID TABLE ACCESS BY INDEX ROWID, 20090806_fallback.T_LANGUAGE INDEX RANGE SCAN, 20090806_fallback.PK_LANGUAGE_LANGUAGE_ITEM
We can see that in this case the French query is extremely fast (2 seconds), while the Latin one is extremely slow (12 seconds).
This is because there probability to find a French name is very high.
The LEFT JOIN
finds every French name it can find in an efficient way, and for those few cases when there is no French name, an English name is selected in a subquery.
Same is true for the Latin, except that the subquery will be run for almost every name. Since the SELECT
clause subqueries (i. e. nested loops) are much less efficient than HASH JOIN
s when lots of rows are selected, this query is less efficient than a double LEFT JOIN
.
Summary
In Oracle, there are two efficient methods to select fallback values:
- If the probability of fallback is high, i. e. there are few main values, it's more efficient to select both the main value and the fallback value in a
LEFT JOIN
, like this:SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE (ln.name, len.name) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'la' LEFT JOIN t_language len ON len.item = i.id AND len.language = 'en' )
-
If the probability of fallback is low, i. e. there are many main values, it's more efficient to select the main value in a
LEFT JOIN
, and the fallback value in a subquery, which is a second argument toCOALESCE
, like this:SELECT SUM(LENGTH(cname)) FROM ( SELECT COALESCE ( ln.name, ( SELECT name FROM t_language len WHERE len.item = i.id AND len.language = 'en' ) ) AS cname FROM t_item i LEFT JOIN t_language ln ON ln.item = i.id AND ln.language = 'fr' )
In Oracle, the second argument to
COALESCE
is evaluated only if the first one isNULL
. The subquery will be evaluated only in the rare cases when the main value will fail to retrieve.
To be continued.