Archive for January 30th, 2010
From Stack Overflow:
I have a table
usersand there is a field
invited_by_idshowing user id of the person who invited this user.
I need to make a MySQL query returning rows with all the fields from users plus a
invites_countfield showing how many people were invited by each user.
The task seems very simple (and it is in fact), but there are at least two approaches to do it. It this article, I will discuss the benefits and the drawbacks of each approach.
The first approach is using
SELECT u.*, COUNT(ui.id) FROM users u LEFT JOIN users ui ON ui.invited_by = u.id GROUP BY u.id
This is a
GROUP BY with a self join, very simple. There are only two little things I'd like to pay some attention to.
First, we need to select all users, even those who invited no other members. An inner join would leave them out, so we use a
LEFT JOIN, and use
COUNT(ui.id) instead of
COUNT(*), because, due to the very nature of aggregation,
COUNT(*) returns at least 1 in a query with
GROUP BY, and
NULLs (which can only result from a
LEFT JOIN miss).
Second, we group by
u.id but use
u.* in the
SELECT clause. Every other engine would fail in this case, but MySQL allows selecting fields that are neighter grouped by nor aggregated. These fields will return an arbitrary value from any of the aggregated records (in practice, that is the record first read in its group).
This behavior is often abused, since MySQL does not guarantee what exactly it will return, but it's perfectly valid for the queries like ours. We don't need to clutter the
GROUP BY clause with all fields from
users if we have already grouped by the
PRIMARY KEY which is already unique. All other values from
users are uniquely defined by the
PRIMARY KEY so there is no matter which arbitrary record will the query use to return ungrouped values: they are all same within the group.
This solution involves correlated subqueries:
SELECT u.*, ( SELECT COUNT(*) FROM users ui WHERE ui.invited_by = u.id ) FROM users u
Here, we calculate the
COUNT(*) in a correlated subquery. This query returns exactly same records as the previous one.
Let's create two sample tables (a MyISAM one and an InnoDB one) and see which solution is more efficient for different scenarios:
Read the rest of this entry »