EXPLAIN EXTENDED

How to create fast database queries

Archive for November 24th, 2009

MySQL: Selecting records holding group-wise maximum (on a unique column)

with 2 comments

Answering questions asked on the site.

Juan asks:

Regarding this question:

I would like to know what database is the most efficient when performing this type of query and if the solutions proposed are already the most efficient.

I am using MySQL but I would like to know if PostgreSQL, Oracle, Microsoft SQL Server or DB2 are much more efficient.

A nice question, this is a common problem for most developers.

For those too lazy to follow the link, the problem in one sentence:

How do I select the whole records, grouped on grouper and holding a group-wise maximum (or minimum) on other column?

I've already covered this in passing in various previous articles in my blog, and now I'll try to make a little summary.

According to the SQL standards, every expression used in the SELECT list of a query that deals with GROUP BY should be either grouped by (that is, used in the GROUP BY) or aggregated.

The reason behind this is quite logical: a grouping query takes the records that have something in common and shrinks them, making a single record out of the whole set constituting the group.

The grouping expressions, of course, share their values across all the records constituting the group (since they are what that makes the group a group), and hence can be used as is.

Other expressions can vary and therefore some algorithm should be applied for shrinking all existing values into a single value. This is what aggregating function does.

The problem is that the the aggregating functions are not interdependent: you cannot use the result of one function as an input to another one. Hence, you cannot select a record in an aggregate query: you can only select a tuple of aggregated values. It's possible that each of these aggregated values in fact can never even found among the values of the original records. Say, with all values of column being strictly positive, you can never find a record holding a SUM(column): it will be greater than any of the values of column.

However, with MIN or MAX, it is guaranteed that at least one record holds the non-NULL value returned by these aggregate functions. The problem is to return the whole record without aggregating the other columns.

As SQL evolved, different solutions had been proposed to solve this problem, and I'll try to cover these solutions in this series of articles.

First, I'll cover MySQL (as per request)

MySQL's dialect of SQL is much more poor than those used by the other systems. That's why most of the solutions used by the other systems, like analytic functions, DISTINCT ON etc. cannot be used in MySQL.

However, MySQL still provides some ways to make the queries that solve this task more efficient.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

November 24th, 2009 at 11:00 pm

Posted in MySQL