Archive for January 30th, 2010
Aggregates: subqueries vs. GROUP BY
From Stack Overflow:
I have a table
users
and there is a fieldinvited_by_id
showing user id of the person who invited this user.I need to make a MySQL query returning rows with all the fields from users plus a
invites_count
field showing how many people were invited by each user.
The task seems very simple (and it is in fact), but there are at least two approaches to do it. It this article, I will discuss the benefits and the drawbacks of each approach.
GROUP BY
The first approach is using GROUP BY
:
SELECT u.*, COUNT(ui.id) FROM users u LEFT JOIN users ui ON ui.invited_by = u.id GROUP BY u.id
This is a GROUP BY
with a self join, very simple. There are only two little things I'd like to pay some attention to.
First, we need to select all users, even those who invited no other members. An inner join would leave them out, so we use a LEFT JOIN
, and use COUNT(ui.id)
instead of COUNT(*)
, because, due to the very nature of aggregation, COUNT(*)
returns at least 1 in a query with GROUP BY
, and COUNT(ui.id)
skips NULL
s (which can only result from a LEFT JOIN
miss).
Second, we group by u.id
but use u.*
in the SELECT
clause. Every other engine would fail in this case, but MySQL allows selecting fields that are neighter grouped by nor aggregated. These fields will return an arbitrary value from any of the aggregated records (in practice, that is the record first read in its group).
This behavior is often abused, since MySQL does not guarantee what exactly it will return, but it's perfectly valid for the queries like ours. We don't need to clutter the GROUP BY
clause with all fields from users
if we have already grouped by the PRIMARY KEY
which is already unique. All other values from users
are uniquely defined by the PRIMARY KEY
so there is no matter which arbitrary record will the query use to return ungrouped values: they are all same within the group.
Subquery
This solution involves correlated subqueries:
SELECT u.*, ( SELECT COUNT(*) FROM users ui WHERE ui.invited_by = u.id ) FROM users u
Here, we calculate the COUNT(*)
in a correlated subquery. This query returns exactly same records as the previous one.
Comparison
Let's create two sample tables (a MyISAM one and an InnoDB one) and see which solution is more efficient for different scenarios:
Read the rest of this entry »