EXPLAIN EXTENDED

How to create fast database queries

Archive for September, 2010

OLAP in MySQL: four ways to filter on higher level dimensions

with 6 comments

Answering questions asked on the site.

Denis Kuzmenok asks:

I need some help with a query I'm fighting with.

I have these tables:

Product
id parent name
Site
id region name
Price
id product site value

For the products with a certain product.parent, I need to select minimal and maximal price within the given site.region, and return first 10 products ordered by the minimal price.

Each parent has 100 to 10,000 products, and there are 10,000,000 records in price table.

We have a classical OLAP task here: a fact table (price) and two dimension tables (product and site).

The task would be almost trivial if we were given the exact values of product and site. This way, we could build a composite index on these fields and value, filter on the exact values and get the first 10 values from the index.

However, we are given not the values of the dimensions themselves but those of the higher levels (parent for product and region for site).

Since the values of the levels are not stored in the facts table, we cannot index them. We need to join the dimension tables and filter on them.

This article describes four ways to join the tables, their efficiency varying depending on the density of the dimension values.

Since the only algorithm to make the joins MySQL is capable of is nested loops, basically, we need to define the order in which the tables would be joined.

There are three tables and, hence, 3! = 6 permutations that define all possible join orders:

product site price
product price site
site price product
site product price
price product site
price site product

However, two dimension tables are completely independent on each other. This means that if they come one after another in the join their order does not actually matter: they both will be searched for independent values. This reduces the number of actual combinations:

product/site price
product price site
site price product
price product/site

The joins must be designed so that the tables with most selective conditions go first. This means that the join order is determined by the density of the values satisfying the criteria. The more values from the table satisfy the search criteria, the later should the table come in the join, so that by the time the join occurs, most values would have been already sifted out.

Now, let's build the queries for all types of the join. To do this, we will create sample tables:
Read the rest of this entry »

Written by Quassnoi

September 30th, 2010 at 11:00 pm

Posted in MySQL