Archive for July 18th, 2009
PostgreSQL 8.4: sampling random rows
On Jul 1, 2009, PostgreSQL 8.4 was released.
In this series of articles, I'd like to show how to reimplement some tasks I wrote about in the previous blog posts using new PostgreSQL features.
Other articles on new features of PostgreSQL 8.4:
Today, I'll show a way to sample random rows from a PRIMARY KEY preserved table.
Usually, if we need, say, 10 random rows from a table, we issue this query:
SELECT * FROM t_random ORDER BY RANDOM() LIMIT 10
PostgreSQL heavily optimizes this query, since it sees a LIMIT
condition and does not sort all rows. Instead, it just keeps a running buffer which contains at most 10 rows with the least values or RANDOM
calculated so far, and when a row small enough is met, it sorts only this buffer, not the whole set.
This is quite efficient, but still requires a full table scan.
This can be a problem, since the queries like that are often run frequently on heavily loaded sites (like for showing 10 random pages on social bookmarking systems), and full table scans will hamper performance significantly.
With new PosgreSQL 8.4 abilities to run recursive queries, this can be improved.
We can sample random values of the row id
s and use an array to record previously selected values.
Let's create a sample table and see how can we imrove this query:
Read the rest of this entry »