Archive for April 2nd, 2010
Multiple attributes in a EAV table: GROUP BY vs. NOT EXISTS
Answering questions asked on the site.
Andrew Stillard asks:
I have a store which will hold around 50,000 products in a products table. Each product will have 14 options, giving 700,000 options in total. These are held in an options table which is joined via the product id.
Users search for products based on the options via an Advanced Search menu.
The users need to be able to select multiple options upon which to query. I would normally use a
JOIN
if it was just the one option to select upon, but because its a variable number i thought it would be best to loop through theWHERE EXISTS
statement.The issue i have currently is that the query is taking a minimum of 18 seconds (And that was a query when the tables only had a fraction of the total products in).
If you can help us speed this up, or suggest an alternative idea that would be greatly appreciated.
The option table mentioned here is in fact an implementation of the EAV model in a relational database.
Each record basically contains 3 things: id
of the product it describes; id
of the option and the value of the given option for the given product. These fields represent entity, attribute and value, respectively.
This model is very easy to maintain and expand should the need arise: all we have to do to define an extra attribute is to add a record to the EAV table with the name and the value of the attribute. This is a DML operation rather than a DDL one.
However, this model has a serious drawback: we cannot efficiently search for two or more options at once. An index can only be defined on several fields from a single record, so we can only search for a single option using an index.
There are two approaches to writing a query which would search for the entities with the certain conditions on several attributes at once:
- For each attribute, find all entities for which the conditions on the given attribute hold, then aggregate the resulting entities, using
COUNT(*)
as a filter. The number of entity entries should be equal to the number of the attributes. - Takes each entity and for each attribute, check if the condition holds.
The first approach uses a GROUP BY
, the second one uses EXISTS
.
Let's create a sample table and see which one is more efficient:
Read the rest of this entry »