Archive for July 29th, 2009
PostgreSQL 8.4: efficient MIN(COUNT(*))
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Frances asks:
I have a table
user_views
which contains pages viewed by users.How do I select the user that had the most and the fewest number of page views?
If you need them at the same time, you'll just have to group by myuser
, then order by COUNT(*)
.
With new PostgreSQL 8.4, it's possible to do this in one CTE, to avoid double evaluation:
WITH q AS ( SELECT myuser, COUNT(*) FROM user_pages GROUP BY myuser ) SELECT * FROM ( SELECT 'max'::TEXT AS which, myuser, count FROM q ORDER BY count DESC LIMIT 1 ) qmax UNION ALL SELECT * FROM ( SELECT 'min'::TEXT AS which, myuser, count FROM q ORDER BY count LIMIT 1 ) qmin
However, it's takes quite a long time if you table is big.
This is probably an answer to the question you asked.
But as a bonus I'd like to tell how to optimize the query for the cases when you need just the user with minimal count of page views.
The main idea here that we should accumulate the least value of COUNT(*)
calculated so far, and stop counting when we reach this threshold. This will save us some row lookups and improve the query.
Let's create a sample table: