OLAP in MySQL: four ways to filter on higher level dimensions
Answering questions asked on the site.
Denis Kuzmenok asks:
I need some help with a query I'm fighting with.
I have these tables:
Product id parent name
Site id region name
Price id product site value For the products with a certain
product.parent
, I need to select minimal and maximal price within the givensite.region
, and return first 10 products ordered by the minimal price.Each
parent
has 100 to 10,000 products, and there are 10,000,000 records inprice
table.
We have a classical OLAP task here: a fact table (price
) and two dimension tables (product
and site
).
The task would be almost trivial if we were given the exact values of product
and site
. This way, we could build a composite index on these fields and value
, filter on the exact values and get the first 10 values from the index.
However, we are given not the values of the dimensions themselves but those of the higher levels (parent
for product
and region
for site
).
Since the values of the levels are not stored in the facts table, we cannot index them. We need to join the dimension tables and filter on them.
This article describes four ways to join the tables, their efficiency varying depending on the density of the dimension values.
Since the only algorithm to make the joins MySQL is capable of is nested loops, basically, we need to define the order in which the tables would be joined.
There are three tables and, hence, 3! = 6 permutations that define all possible join orders:
product | site | price |
product | price | site |
site | price | product |
site | product | price |
price | product | site |
price | site | product |
However, two dimension tables are completely independent on each other. This means that if they come one after another in the join their order does not actually matter: they both will be searched for independent values. This reduces the number of actual combinations:
product/site | price | |
product | price | site |
site | price | product |
price | product/site |
The joins must be designed so that the tables with most selective conditions go first. This means that the join order is determined by the density of the values satisfying the criteria. The more values from the table satisfy the search criteria, the later should the table come in the join, so that by the time the join occurs, most values would have been already sifted out.
Now, let's build the queries for all types of the join. To do this, we will create sample tables:
Table creation details
Members of the dimensions are distributed logarithmically, reducing from the lower to the higher values.
Say, 11,379 products of 250,000 (4.55% of all products) belong to the parent 1, while only 2,906 products (1.16%) belong to the parent 30:
parent | total | percent | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 11379 | 4.55 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 10724 | 4.29 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 10259 | 4.10 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 9830 | 3.93 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 9502 | 3.80 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 8880 | 3.55 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 8555 | 3.42 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 8072 | 3.23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 7715 | 3.09 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 7522 | 3.01 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
252 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
253 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
263 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
309 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
250000 | 100.00 |
Same distribution applies to the sites:
region | total | percent | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 10446 | 20.89 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 8010 | 16.02 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | 6602 | 13.20 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 5119 | 10.24 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | 4096 | 8.19 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 3296 | 6.59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | 2604 | 5.21 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 2006 | 4.01 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | 1614 | 3.23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 1243 | 2.49 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
43 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
45 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
46 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
50 | 1 | 0.00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
50000 | 100.00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
47 rows fetched in 0.0014s (0.0709s) |
Product/site, price
This is the most straightforward query. Since the product and site table come first, it's best for sparse products and sparse sites.
They will be filtered first and the results of the filtering cross joined (still returning relatively few values to join against the prices).
Here's the query:
SELECT p.*, MIN(r.value) AS min_value FROM product p CROSS JOIN site s STRAIGHT_JOIN price r ON r.product = p.id AND r.site = s.id WHERE p.parent = 100 AND s.region = 30 GROUP BY p.id ORDER BY min_value LIMIT 10
id | parent | name | min_value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
195818 | 100 | Product 195818 | 30.3642 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
54185 | 100 | Product 54185 | 58.377 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 rows fetched in 0.0002s (0.0131s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | p | ref | PRIMARY,ix_product_parent | ix_product_parent | 4 | const | 136 | 100.00 | Using where; Using temporary; Using filesort |
1 | SIMPLE | s | ref | PRIMARY,ix_site_region | ix_site_region | 4 | const | 12 | 100.00 | Using index |
1 | SIMPLE | r | ref | ix_price_site_product_value_id,ix_price_product_value_id | ix_price_site_product_value_id | 8 | 20100930_prices.s.id,20100930_prices.p.id | 1 | 100.00 | Using index |
select `20100930_prices`.`p`.`id` AS `id`,`20100930_prices`.`p`.`parent` AS `parent`,`20100930_prices`.`p`.`name` AS `name`,min(`20100930_prices`.`r`.`value`) AS `min_value` from `20100930_prices`.`product` `p` join `20100930_prices`.`site` `s` straight_join `20100930_prices`.`price` `r` where ((`20100930_prices`.`r`.`site` = `20100930_prices`.`s`.`id`) and (`20100930_prices`.`r`.`product` = `20100930_prices`.`p`.`id`) and (`20100930_prices`.`s`.`region` = 30) and (`20100930_prices`.`p`.`parent` = 100)) group by `20100930_prices`.`p`.`id` order by min(`20100930_prices`.`r`.`value`) limit 10
We see that there is a filesort
operation in the plan, but since there are relatively few values to sort, this operation is quite fast.
This query completes in 13 ms.
Site, product, price
This query is quite similar to the previous one, differing only in the JOIN
order. It is best for sparse sites and dense products:
SELECT p.*, MIN(r.value) AS min_value FROM site s STRAIGHT_JOIN price r ON r.site = s.id STRAIGHT_JOIN product p ON p.parent = 1 AND p.id = r.product WHERE s.region = 30 GROUP BY p.id ORDER BY min_value LIMIT 10
id | parent | name | min_value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
82678 | 1 | Product 82678 | 22.7273 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
35757 | 1 | Product 35757 | 23.1834 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
244754 | 1 | Product 244754 | 23.5035 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
217731 | 1 | Product 217731 | 24.3615 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
99861 | 1 | Product 99861 | 25.5681 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
127003 | 1 | Product 127003 | 26.8058 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
69848 | 1 | Product 69848 | 26.9732 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
128191 | 1 | Product 128191 | 32.3508 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
99234 | 1 | Product 99234 | 32.9937 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
242035 | 1 | Product 242035 | 33.9945 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0004s (0.0300s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | s | ref | PRIMARY,ix_site_region | ix_site_region | 4 | const | 12 | 100.00 | Using index; Using temporary; Using filesort |
1 | SIMPLE | r | ref | ix_price_site_product_value_id,ix_price_product_value_id | ix_price_site_product_value_id | 4 | 20100930_prices.s.id | 54 | 100.00 | Using index |
1 | SIMPLE | p | eq_ref | PRIMARY,ix_product_parent | PRIMARY | 4 | 20100930_prices.r.product | 1 | 100.00 | Using where |
select `20100930_prices`.`p`.`id` AS `id`,`20100930_prices`.`p`.`parent` AS `parent`,`20100930_prices`.`p`.`name` AS `name`,min(`20100930_prices`.`r`.`value`) AS `min_value` from `20100930_prices`.`site` `s` straight_join `20100930_prices`.`price` `r` straight_join `20100930_prices`.`product` `p` where ((`20100930_prices`.`r`.`site` = `20100930_prices`.`s`.`id`) and (`20100930_prices`.`p`.`id` = `20100930_prices`.`r`.`product`) and (`20100930_prices`.`p`.`parent` = 1) and (`20100930_prices`.`s`.`region` = 30)) group by `20100930_prices`.`p`.`id` order by min(`20100930_prices`.`r`.`value`) limit 10
This query completes in 30 ms, since the conditions (in their totality) are less selective.
Product, price, site
This query is best for sparse products and dense sites:
In this query, we will replace the JOIN
against the site table with an IN
predicate, since we don't actually need any information from that table:
SELECT p.*, MIN(r.value) AS min_value FROM product p STRAIGHT_JOIN price r ON r.product = p.id WHERE p.parent = 100 AND r.site IN ( SELECT id FROM site WHERE region = 1 ) GROUP BY p.id ORDER BY min_value LIMIT 10
id | parent | name | min_value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
44246 | 100 | Product 44246 | 20.4741 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
54185 | 100 | Product 54185 | 20.5249 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
241517 | 100 | Product 241517 | 20.7065 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
131185 | 100 | Product 131185 | 21.1123 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
61725 | 100 | Product 61725 | 21.1389 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
80285 | 100 | Product 80285 | 21.3354 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
22533 | 100 | Product 22533 | 21.4976 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
226588 | 100 | Product 226588 | 21.5061 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
114991 | 100 | Product 114991 | 21.5185 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5571 | 100 | Product 5571 | 21.7684 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0004s (0.0302s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | p | ref | PRIMARY,ix_product_parent | ix_product_parent | 4 | const | 136 | 100.00 | Using where; Using temporary; Using filesort |
1 | PRIMARY | r | ref | ix_price_product_value_id | ix_price_product_value_id | 4 | 20100930_prices.p.id | 12 | 100.00 | Using where |
2 | DEPENDENT SUBQUERY | site | unique_subquery | PRIMARY,ix_site_region | PRIMARY | 4 | func | 1 | 100.00 | Using where |
select `20100930_prices`.`p`.`id` AS `id`,`20100930_prices`.`p`.`parent` AS `parent`,`20100930_prices`.`p`.`name` AS `name`,min(`20100930_prices`.`r`.`value`) AS `min_value` from `20100930_prices`.`product` `p` straight_join `20100930_prices`.`price` `r` where ((`20100930_prices`.`r`.`product` = `20100930_prices`.`p`.`id`) and (`20100930_prices`.`p`.`parent` = 100) and <in_optimizer>(`20100930_prices`.`r`.`site`,<exists>(<primary_index_lookup>(<cache>(`20100930_prices`.`r`.`site`) in site on PRIMARY where ((`20100930_prices`.`site`.`region` = 1) and (<cache>(`20100930_prices`.`r`.`site`) = `20100930_prices`.`site`.`id`)))))) group by `20100930_prices`.`p`.`id` order by min(`20100930_prices`.`r`.`value`) limit 10
Again, this query is 30 ms.
Price, product/site
This query is best for dense products and dense sites. This is the most complex query which would require some explanation.
At the first sight, it may seem that price
cannot be filtered on. However, it's not true.
What we need is 10 lowest values from price
satisfying certain conditions (right products and sites, and, additionally, products should be unique).
This means that we could make use of an index on price.value
, scanning it until 10 records satisfying these conditions are returned. In this case, ORDER BY
and LIMIT 10
would serve as filtering conditions: the scanning would cease as soon as the limit is reached.
And of course the more dense are the products and sites, the more is the probability for the conditions to be satisfied, the sooner the query completes.
Filtering on products and sites is easy, but to ensure that the products are unique, we will use the trick described in my previous article:
Basically, when filtering a price, we should not only make sure that it holds the correct product, but that it's the first (lowest) price for a given product.
This can be easily done with an additional ORDER BY / LIMIT
in a subquery.
Here's the query:
SELECT p.*, value AS min_value FROM price r STRAIGHT_JOIN product p ON p.id = r.product WHERE p.parent = 1 AND r.id = ( SELECT ri.id FROM price ri WHERE ri.product = p.id AND ri.site IN ( SELECT id FROM site WHERE region = 1 ) ORDER BY ri.product, ri.value, ri.id LIMIT 1 ) ORDER BY r.value, r.id LIMIT 10
id | parent | name | min_value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
80226 | 1 | Product 80226 | 20.0021 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
168980 | 1 | Product 168980 | 20.0028 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
185520 | 1 | Product 185520 | 20.0041 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
131416 | 1 | Product 131416 | 20.0054 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
72154 | 1 | Product 72154 | 20.0055 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
76637 | 1 | Product 76637 | 20.0067 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
95946 | 1 | Product 95946 | 20.0091 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
28136 | 1 | Product 28136 | 20.0102 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
21426 | 1 | Product 21426 | 20.0142 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
24976 | 1 | Product 24976 | 20.0145 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0004s (0.0144s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | r | index | PRIMARY,ix_price_product_value_id | ix_price_value_id | 8 | 10 | 62505352.00 | ||
1 | PRIMARY | p | eq_ref | PRIMARY,ix_product_parent | PRIMARY | 4 | 20100930_prices.r.product | 1 | 100.00 | Using where |
2 | DEPENDENT SUBQUERY | ri | ref | ix_price_product_value_id | ix_price_product_value_id | 4 | 20100930_prices.p.id | 12 | 100.00 | Using where |
3 | DEPENDENT SUBQUERY | site | unique_subquery | PRIMARY,ix_site_region | PRIMARY | 4 | func | 1 | 100.00 | Using where |
Field or reference '20100930_prices.p.id' of SELECT #2 was resolved in SELECT #1 select `20100930_prices`.`p`.`id` AS `id`,`20100930_prices`.`p`.`parent` AS `parent`,`20100930_prices`.`p`.`name` AS `name`,`20100930_prices`.`r`.`value` AS `min_value` from `20100930_prices`.`price` `r` straight_join `20100930_prices`.`product` `p` where ((`20100930_prices`.`p`.`id` = `20100930_prices`.`r`.`product`) and (`20100930_prices`.`p`.`parent` = 1) and (`20100930_prices`.`r`.`id` = (select `20100930_prices`.`ri`.`id` from `20100930_prices`.`price` `ri` where ((`20100930_prices`.`ri`.`product` = `20100930_prices`.`p`.`id`) and <in_optimizer>(`20100930_prices`.`ri`.`site`,<exists>(<primary_index_lookup>(<cache>(`20100930_prices`.`ri`.`site`) in site on PRIMARY where ((`20100930_prices`.`site`.`region` = 1) and (<cache>(`20100930_prices`.`ri`.`site`) = `20100930_prices`.`site`.`id`)))))) order by `20100930_prices`.`ri`.`product`,`20100930_prices`.`ri`.`value`,`20100930_prices`.`ri`.`id` limit 1))) order by `20100930_prices`.`r`.`value`,`20100930_prices`.`r`.`id` limit 10
We see that this query completes in only 14 ms despite the fact that we used the most populated parent
and region
. As I said before, the LIMIT 10
served as another filtering condition.
Summary
Let's run all possible queries for all possible distributions and record the query times in a table
P, S and R stand for product, site and price in the join order.
+Inf means that the query did not complete in a reasonable time and had to be terminated.
Product | |||||||||
Sparse (parent 100) | Dense (parent 1) | ||||||||
P/S,R | S,R,P | P,R,S | R,P/S | P/S,R | S,R,P | P,R,S | R,P/S | ||
---|---|---|---|---|---|---|---|---|---|
Site | Sparse (region 30) | 13 | 27 | 30 | +Inf | 621 | 30 | 6,703 | 51,468 |
Dense (region 1) | 7,679 | 4,427 | 30 | 1,051 | +Inf | 5,914 | 6,736 | 14 |
Conclusion
If limiting the results, MySQL allows filtering fact tables efficiently on higher level dimensions despite the fact these dimensions cannot be indexed. However, in each case the selectivity of the dimension level should be taken into account and the appropriate query should be used.
It should be noted that the queries involving a CROSS JOIN
and index scan on the fact table perform intolerably poorly in the edge cases (too dense and too sparse dimensions, accordingly). On the other hand, the queries involving a join only differ in the join order which can be predicted by MySQL.
This means that when the dimension selectivity is unknown, a query using a plain join of all three tables (without forcing the join order) should be the query of choice:
SELECT p.*, MIN(r.value) AS min_value FROM product p CROSS JOIN site s JOIN price r ON r.product = p.id AND r.site = s.id WHERE p.parent = 1 AND s.region = 1 GROUP BY p.id ORDER BY min_value LIMIT 10
, since MySQL can select a join order quite close to optimal and the query would complete in reasonable time.
However, when the distribution on both dimensions is known, the appropriate query from the table above can be chosen.
Hope that helps.
I'm always glad to answer the questions regarding database queries.
I have a related question to this post on SO, please have a look: http://stackoverflow.com/questions/3845181/does-mysql-use-existing-indexes-on-creating-new-indexes
Pentium10
2 Oct 10 at 13:55
The first two queries run smoothly, under 0.030 seconds, but the 3rd one takes 47 seconds. Tried to run multiple times, time no more improves. Why happens this so differently for 3rd one? I’ve run the explain extended on it, and returns the same as in the post above.
Pentium10
2 Oct 10 at 20:16
@Pentium10: well, the only thing I can say: “it works on my machine” :)
Check that all your indexes are created and used (though they must be if you are saying the plans are identical). Especially check the UNIQUE_SUBQUERY on site.
Quassnoi
4 Oct 10 at 00:22
It works on my system too (mysql version 5.1.48-2 on Fedora 13)
Sanjay
4 Oct 10 at 02:56
links for 2010-10-22…
OLAP in MySQL: four ways to filter on higher level dimensions | EXPLAIN … OLAP Work in MYSQL, Explain EXTENDED ist sowieso ein generell zu empfehlendes Blog (tags: sql mysql)…
Die wunderbare Welt von Isotopp
22 Oct 10 at 20:02
can i use data cubes in mysql like ssas in the mssql
javad
2 Sep 19 at 12:49