Things SQL needs: determining range cardinality
What is the problem with this query?
SELECT *
FROM orders
WHERE quantity <= 4
AND urgency <= 4
The problem is indexing strategy, of course. Which columns should we index?
If we index quantity, the optimizer will be able to use the index to filter on it. However, filtering on urgency will require scanning all records with quantity < 4 and applying the urgency filter to each record found.
Same with urgency. We can use range access on urgency using an index, but this will require filtering on quantity.
Why, create a composite index!
, some will say.
Unfortunately, that won’t help much.
A composite B-Tree index maintains what is called a lexicographical order of the records. This means that an index on (quantity, urgency) will sort on quantity, and only if the quantities are equal, it will take the urgency into account.
The picture below shows how would the records be ordered in such an index:

As we can see, with a single index range scan (i. e. just following the arrows) we cannot select only the records within the dashed rectangle. There is no single index range that could be used to filter on both columns.
Even if we changed the field order in the index, it would just change the direction of the arrows connecting the records:

, and still no single range that contains only the records we need.
Can we improve it?
If we take at closer look at the table contents we will see that despite the fact that there is no single range that contains the records we need (and only them), there are four ranges that do filter our records.
If we rewrote the query condition like that:
SELECT *
FROM orders
WHERE quantity IN (0, 1, 2, 3, 4)
AND urgency <= 4
with the first index, or like that:
SELECT *
FROM orders
WHERE quantity <= 4
AND urgency IN (0, 1, 2, 3, 4)
with the second index, then any decent SQL engine would build a decent and efficient plan (Index range in MySQL, INLIST ITERATOR in Oracle etc.)
The problem is that rewriting this query still requires an SQL developer. But this could be done automatically by the optimizer. There are several methods to do that.
Smallest superset
A range condition defines a set of the values that could possibly satisfy the condition (i. e. belong to the range). The number
of the values that satisfy this condition is called set cardinality. The word number is quoted here because it can be infinite, and not all infinities are created equal: some are more infinite than the others!
However, if we take the column definition into account, we can see that some ranges define finite (and quite constrained) sets of possible values. For instance, a condition like quantity < 4 on a column that is defined as UNSIGNED INT can possibly be satisfied by five values: 0, 1, 2, 3 and 4.
This set is the smallest superset of all sets of values that satisfy the range condition.
Loose index scan
Even with the conditions that theoretically define the infinite sets of values that could satisfy them, practically there is always a finite number of values in the table that do satisfy them (the table itself contains the finite number of records, to begin with).
And most engines keep track of that number in their statistics tables: this is what is called field cardinality, a measure of field uniqueness.
If the range cardinality is expected to be low (either from the set of values that can possibly belong to the range, or from the actual number of distinct values that do belong to the range, according to statistics), it would be a wise idea to rewrite the range condition as an IN condition containing all possible values that can belong or do belong to the range.
This will replace a single less than
or greater then
with a small number of equals to
. And an equals to
gives the optimizer much more space to, um, optimize. It can use a HASH JOIN, split the index into a number of continuous ranges or do some other interesting things that can only be done with an equijoin.
With an UNSIGNED INTEGER column, it is easy to generate a set of values that could satisfy the range. But what if we know the range cardinality to be low from the statistics, not from the column datatype?
In this case, we could build the set of possible values using what MySQL calls a loose index scan.
Basically, it takes the first record from the index and then recursively searches the next lowest record whose key value is greater than the previous one, using the index seek (as opposed to index scan). This means instead of mere scanning the index and applying the condition to each field, the engine would walk up and down the B-Tree to locate the first record the the greater key. It is much more efficient when the number of distinct keys is small. And in fact, MySQL does use this method for queries that involve SELECT DISTINCT on an indexed field.
MIN / MAX
This method would be useful for the open ranges (like > or <, as opposed to BETWEEN).
By default, the INT column means SIGNED INT. For a condition like quantity <= 4 this would mean all integers from -2,147,483,647 to 4 which is way too many.
In real tables, the quantities would be something greater than 0. But not all developers bother to pick a right datatype or add CHECK constraints for their columns (and some databases like PostgreSQL lack the unsigned datatypes anyway).
To work around this, we could find the minimum existing value in the index using a single index seek. It would serve as a lower bound for the range. Since in real table that would most probably be something like 0 or 1, that would make the range much more constrained.
All these three methods could be used at the same time. Since the methods require nothing but just a single lookup of the statistics table and a single index seek, the most efficient method to return the subset of values that could satisfy the range could be chosen at runtime.
Implementation
Now, let’s make a sample table in PostgreSQL and see how could we benefit from replacing the low-cardinality ranges with the lists of values:
This table contains 16,000,000 records with two integer fields and two double precision fields.
There are composite indexes on the pairs of fields. This indexes are intentionally created with the least selective column leading to demonstrate the benefits of range transformation.
First, let’s run a query similar to the original one:
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 <= 20
AND uint2 <= 20
| sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 75600 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (5.1249s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregate (cost=171331.57..171331.58 rows=1 width=204)
-> Index Scan using ix_composite_uint on t_composite (cost=0.00..171329.57 rows=796 width=204)
Index Cond: ((uint1 <= 20) AND (uint2 <= 20))
The plan says that the index condition involves both fields. However, only the first field is used in the B-Tree search: the second is just being filtered on, though no actual table access is performed yet on this step.
The index keys are not too long, however, there are several millions of them that need to be scanned. That’s why the query takes more than 5 seconds to complete.
The I/O statistics show the following:
SELECT pg_stat_reset();
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 <= 20
AND uint2 <= 20;
SELECT pg_stat_get_blocks_fetched('ix_composite_uint'::regclass);
| pg_stat_get_blocks_fetched |
|---|
| 21865 |
The query required more than twenty thousands of index blocks to be read and examined
Smallest superset
Now, let’s try to substitute the hard-coded list of possible values instead of the range condition:
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 IN
(
SELECT generate_series(0, 20)
)
AND uint2 <= 20
| sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 75600 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (0.0063s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregate (cost=167.47..167.49 rows=1 width=204)
-> Nested Loop (cost=0.02..167.37 rows=40 width=204)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Index Scan using ix_composite_uint on t_composite (cost=0.00..166.84 rows=40 width=208)
Index Cond: ((t_composite.uint1 = (generate_series(0, 20))) AND (t_composite.uint2 <= 20))
Instead of a giant singe range, there are 21 short ranges examined in a nested loop. This is instant (6 ms).
Let’s look into the I/O statistics again:
SELECT pg_stat_reset();
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 IN
(
SELECT generate_series(0, 20)
)
AND uint2 <= 20;
SELECT pg_stat_get_blocks_fetched('ix_composite_uint'::regclass);
| pg_stat_get_blocks_fetched |
|---|
| 64 |
Now, only 64 blocks need to be read.
MIN / MAX
We took 0 as the initial value, but since theoretically there can be negative numbers in the columns, this assumption is not safe.
We need to get the least value from the table instead of assuming it:
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 IN
(
SELECT generate_series(
(
SELECT MIN(uint1)
FROM t_composite
), 20)
)
AND uint2 <= 20
| sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 75600 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (0.0066s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregate (cost=171.40..171.41 rows=1 width=204)
-> Nested Loop (cost=3.95..171.29 rows=40 width=204)
-> HashAggregate (cost=3.95..3.96 rows=1 width=4)
-> Result (cost=3.92..3.93 rows=1 width=0)
InitPlan 2 (returns $1)
-> Result (cost=3.91..3.92 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..3.91 rows=1 width=4)
-> Index Scan using ix_composite_uint on t_composite (cost=0.00..62571556.68 rows=15999664 width=4)
Filter: (uint1 IS NOT NULL)
-> Index Scan using ix_composite_uint on t_composite (cost=0.00..166.84 rows=40 width=208)
Index Cond: (("20100519_cardinality".t_composite.uint1 = (generate_series($1, 20))) AND ("20100519_cardinality".t_composite.uint2 <= 20))
This is instant again.
Now, what about statistics?
SELECT pg_stat_reset();
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND uint1 IN
(
SELECT generate_series(
(
SELECT MIN(uint1)
FROM t_composite
), 20)
)
AND uint2 <= 20;
SELECT pg_stat_get_blocks_fetched('ix_composite_uint'::regclass);
| pg_stat_get_blocks_fetched |
|---|
| 67 |
The query is only 3 block reads heavier, but this time it is guaranteed to be correct.
Loose index scan
The real* columns hold the double precision data.
First, let’s run the original query:
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND real1 <= 0.2
AND real2 <= 0.2
| sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 83600 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (6.6561s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregate (cost=206210.82..206210.84 rows=1 width=204)
-> Index Scan using ix_composite_real on t_composite (cost=0.00..206208.84 rows=793 width=204)
Index Cond: ((real1 <= 0.2::double precision) AND (real2 <= 0.2::double precision))
Again, there are way too many block reads:
SELECT pg_stat_reset();
SELECT SUM(LENGTH(stuffing))
FROM t_composite
WHERE 1 = 1
AND real1 <= 0.2
AND real2 <= 0.2;
SELECT pg_stat_get_blocks_fetched('ix_composite_real'::regclass);
| pg_stat_get_blocks_fetched |
|---|
| 30657 |
, and the query takes more than 6 seconds.
For a condition like real1 <= 0.2, the smallest superset of all possible values (that is all possible double-precision values between 0 and 0.2) would be too large (though still finite of course) to be generated and joined. That’s why we need to use server-collected statistics to decide whether a loose index scan would be efficient to get the list of all distinct values of real1 in the table:
SELECT n_distinct, most_common_vals, histogram_bounds
FROM pg_stats
WHERE schemaname = '20100519_cardinality'
AND tablename = 't_composite'
AND attname IN ('real1', 'real2')
| n_distinct | most_common_vals | histogram_bounds | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 40 | {0.09,0.22,0.01,0.37,0.04,0.25,0.28,0.34,0.27,0.06,0.36,0.11,0.08,0.39,0.12,0.2,0.02,0.16,0.17,0.21,0.29,0.18,0.19,0.26,0.3,0.32,0.35,0.1,0.13,0.4,0.15,0.23,0.38,0.03,0.33,0.24,0.07,0.14,0.05,0.31} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 362181 | {1781.85,128.6,142.73,257.88,332.62,618.61,705.35,829.91,845.82,874.08,1432.82,1469.16,1486.01,1569.16,1866.43,2111.48,2234.48,2282.78,2340.7,2382.54,2468.19,2491.35,2494.73,2508.51,2587.51,2750.98,2876.07,2956.11,3222.62,3463.2,3564.41,3872.25,3.43,13.22,14.3,15.82,16.12,16.36,19.98,21.79,23.03,25.41,28.24,28.96,31.95,35.1,36.31,38.46,39.72,39.87,42.42,49.32,50.84,56.59,60.31,81.8,84.48,84.74,86.62,88.58,91.82,103.48,112.88,114.99,117.55,119.55,122.52,123.04,127.98,128.26,130.56,131.15,134.57,134.91,135.8,139.05,139.41,141.39,146.13,151.92,158.16,158.58,161.71,169.92,170.43,173.3,173.74,178.06,189.43,192.99,199.69,210.98,218.45,225.44,230.16,233.25,233.67,240.9,246.5,249.77} | {0.07,44.3,86.34,129.18,172.01,212.46,254.83,294.83,336.18,374.44,414.75,458.07,498.72,540.88,582.92,621.95,660.28,700.75,738.46,778.95,822.09,857.34,896.27,937.63,977.05,1017.49,1062.93,1104.88,1147.17,1185.99,1225.41,1267.8,1307.45,1347.49,1386.1,1424.05,1465.05,1503.23,1542.03,1581.45,1619.53,1658.57,1697.74,1741.65,1782.59,1823.19,1867.54,1905.59,1945.39,1986.04,2022.32,2061.52,2102.77,2143.07,2180.29,2218.06,2262.62,2302.28,2342.05,2382.29,2416.18,2455.43,2489.25,2528.07,2572.15,2606.91,2648.59,2685.43,2724.9,2765.49,2805.78,2845.94,2886.33,2925.52,2967.07,3005.12,3046.34,3084.44,3122.81,3158.25,3199.74,3242.4,3279.34,3319.41,3359.39,3401.15,3436.32,3477.28,3517.05,3559.6,3599.6,3639.63,3678.82,3720.27,3760.32,3801.89,3844.98,3883.96,3922.63,3963.01,3999.98} | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 rows fetched in 0.0003s (0.0072s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
From the server statistics we see that there are only 40 distinct values of real1. Hence, a loose index scan as such would be efficient.
Let’s look into the stats on real2. We see that there are 362181 distinct values in the table, and the range in question (<= 0.2) corresponds to the first entry in the histogram (BETWEEN 0.07 AND 44.3). Since the histogram splits the values into 100 percentiles, this means that there are about 3622 values from 0.07 to 44.3, and ((0.2 - 0.07) / (44.3 - 0.07) * 362181 / 100) ≈ 11 distinct values inside our range, and about 11 * 16000000 /362181 ≈ 486 records with these values.
Taking into account that the original query would need to scan 8,000,000 records, the loose index scan seems to be a good idea.
Unfortunately, PostgreSQL does not support it directly, but with minimal effort it can be emulated:
WITH RECURSIVE q (real1) AS
(
SELECT MIN(real1)
FROM t_composite
UNION ALL
SELECT (
SELECT c.real1
FROM t_composite c
WHERE c.real1 > q.real1
AND c.real1 <= 0.2
ORDER BY
c.real1
LIMIT 1
)
FROM q
WHERE q.real1 IS NOT NULL
)
SELECT SUM(LENGTH(stuffing))
FROM q
JOIN t_composite c
ON c.real1 = q.real1
AND c.real2 <= 0.2
| sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 83600 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (0.0082s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregate (cost=17319.78..17319.80 rows=1 width=204)
CTE q
-> Recursive Union (cost=3.92..401.33 rows=101 width=8)
-> Result (cost=3.92..3.93 rows=1 width=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.00..3.92 rows=1 width=8)
-> Index Scan using ix_composite_real on t_composite (cost=0.00..62694798.76 rows=15999664 width=8)
Filter: (real1 IS NOT NULL)
-> WorkTable Scan on q (cost=0.00..39.54 rows=10 width=8)
Filter: (q.real1 IS NOT NULL)
SubPlan 2
-> Limit (cost=0.00..3.93 rows=1 width=8)
-> Index Scan using ix_composite_real on t_composite c (cost=0.00..314695.34 rows=79998 width=8)
Index Cond: ((real1 > $2) AND (real1 <= 0.2::double precision))
-> Nested Loop (cost=0.00..16914.48 rows=1588 width=204)
-> CTE Scan on q (cost=0.00..2.02 rows=101 width=8)
-> Index Scan using ix_composite_real on t_composite c (cost=0.00..166.95 rows=40 width=212)
Index Cond: ((c.real1 = q.real1) AND (c.real2 <= 0.2::double precision))
This is almost instant again. Here are the I/O statistics:
SELECT pg_stat_reset();
WITH RECURSIVE q (real1) AS
(
SELECT MIN(real1)
FROM t_composite
UNION ALL
SELECT (
SELECT c.real1
FROM t_composite c
WHERE c.real1 > q.real1
AND c.real1 <= 0.2
ORDER BY
c.real1
LIMIT 1
)
FROM q
WHERE q.real1 IS NOT NULL
)
SELECT SUM(LENGTH(stuffing))
FROM q
JOIN t_composite c
ON c.real1 = q.real1
AND c.real2 <= 0.2;
SELECT pg_stat_get_blocks_fetched('ix_composite_real'::regclass);
| pg_stat_get_blocks_fetched |
|---|
| 165 |
The number of index reads is reduced greatly again.
Summary
In some cases, a range predicate (like less than
, greater than
or between
) can be rewritten as an IN predicate against the list of values that could satisfy the range condition.
Depending on the column datatype, check constraints and statistics, that list could be comprised of all possible values defined by the column’s domain; all possible values defined by column’s minimal and maximal value, or all actual distinct values contained in the table. In the latter case, a loose index scan could be used to retrieve the list of such values.
Since an equality condition is applied to each value in the list, more access and join methods could be used to build the query plain, including range conditions on secondary index columns, hash lookups etc.
Whenever the optimizer builds a plan for a query that contains a range predicate, it should consider rewriting the range condition as an IN predicate and use the latter method if it proves more efficient.







Subscribe in a reader
great article :)
Thanks
marc-o
23 May 10 at 23:00
Great article, and I think it provides an answer to a stackoverflow question. I posted an answer with a link to this article and I included your summary in the answer. I hope this is ok. The link to the answer is:
http://stackoverflow.com/questions/2914082/how-do-you-design-a-database-to-allow-fast-multicolumn-searching/2915541#2915541
Trey Jackson
26 May 10 at 22:08
@Trey: the mechanism used for a multi-column search with ranged conditions would be similar indeed. The article actually discusses the mechanism’s integration into the query optimizers: an experienced developer can rewrite the query using the same approach I used in the article. The query still needs to be backed by a composite index (or a number of indexes) on the appropriate columns, though.
Quassnoi
27 May 10 at 02:05
Nice article, as always. Big fan of your site, thanks!
Frode
18 Jun 10 at 15:48
@Frode: thanks!
Quassnoi
22 Jun 10 at 11:15
Thanks for letting us understand how indexing works, I did understood well now.
Thanks for the good work.
Just curious to know how does these number works with “joins” (many-tables).
Rakesh
2 Jul 10 at 11:35
@author/admin:-
If you guys have a plugin which will give you the list of related articles in end of each blog sections will be great.
Rakesh
2 Jul 10 at 11:37