Archive for November 24th, 2009
MySQL: Selecting records holding group-wise maximum (on a unique column)
Answering questions asked on the site.
Juan asks:
Regarding this question:
I would like to know what database is the most efficient when performing this type of query and if the solutions proposed are already the most efficient.
I am using MySQL but I would like to know if PostgreSQL, Oracle, Microsoft SQL Server or DB2 are much more efficient.
A nice question, this is a common problem for most developers.
For those too lazy to follow the link, the problem in one sentence:
How do I select the whole records, grouped on
grouper
and holding a group-wise maximum (or minimum) on other column?
I've already covered this in passing in various previous articles in my blog, and now I'll try to make a little summary.
According to the SQL standards, every expression used in the SELECT
list of a query that deals with GROUP BY
should be either grouped by (that is, used in the GROUP BY
) or aggregated.
The reason behind this is quite logical: a grouping query takes the records that have something in common and shrinks them, making a single record out of the whole set constituting the group.
The grouping expressions, of course, share their values across all the records constituting the group (since they are what that makes the group a group), and hence can be used as is.
Other expressions can vary and therefore some algorithm should be applied for shrinking all existing values into a single value. This is what aggregating function does.
The problem is that the the aggregating functions are not interdependent: you cannot use the result of one function as an input to another one. Hence, you cannot select a record in an aggregate query: you can only select a tuple of aggregated values. It's possible that each of these aggregated values in fact can never even found among the values of the original records. Say, with all values of column
being strictly positive, you can never find a record holding a SUM(column)
: it will be greater than any of the values of column
.
However, with MIN
or MAX
, it is guaranteed that at least one record holds the non-NULL
value returned by these aggregate functions. The problem is to return the whole record without aggregating the other columns.
As SQL evolved, different solutions had been proposed to solve this problem, and I'll try to cover these solutions in this series of articles.
First, I'll cover MySQL (as per request)
MySQL's dialect of SQL is much more poor than those used by the other systems. That's why most of the solutions used by the other systems, like analytic functions, DISTINCT ON
etc. cannot be used in MySQL.
However, MySQL still provides some ways to make the queries that solve this task more efficient.
Let's create a sample table:
Read the rest of this entry »