Archive for July 29th, 2009
PostgreSQL 8.4: efficient MIN(COUNT(*))
Answering questions asked on the site.
Frances asks:
I have a table
user_viewswhich contains pages viewed by users.How do I select the user that had the most and the fewest number of page views?
If you need them at the same time, you’ll just have to group by myuser, then order by COUNT(*).
With new PostgreSQL 8.4, it’s possible to do this in one CTE, to avoid double evaluation:
WITH q AS
(
SELECT myuser, COUNT(*)
FROM user_pages
GROUP BY
myuser
)
SELECT *
FROM (
SELECT 'max'::TEXT AS which, myuser, count
FROM q
ORDER BY
count DESC
LIMIT 1
) qmax
UNION ALL
SELECT *
FROM (
SELECT 'min'::TEXT AS which, myuser, count
FROM q
ORDER BY
count
LIMIT 1
) qmin
However, it’s takes quite a long time if you table is big.
This is probably an answer to the question you asked.
But as a bonus I’d like to tell how to optimize the query for the cases when you need just the user with minimal count of page views.
The main idea here that we should accumulate the least value of COUNT(*) calculated so far, and stop counting when we reach this threshold. This will save us some row lookups and improve the query.
Let’s create a sample table:
Subscribe in a reader