EXPLAIN EXTENDED

How to create fast database queries

Archive for July 18th, 2009

PostgreSQL 8.4: sampling random rows

with 2 comments

On Jul 1, 2009, PostgreSQL 8.4 was released.

In this series of articles, I'd like to show how to reimplement some tasks I wrote about in the previous blog posts using new PostgreSQL features.

Other articles on new features of PostgreSQL 8.4:

Today, I'll show a way to sample random rows from a PRIMARY KEY preserved table.

Usually, if we need, say, 10 random rows from a table, we issue this query:

SELECT	*
FROM	t_random
ORDER BY
RANDOM()
LIMIT 10

PostgreSQL heavily optimizes this query, since it sees a LIMIT condition and does not sort all rows. Instead, it just keeps a running buffer which contains at most 10 rows with the least values or RANDOM calculated so far, and when a row small enough is met, it sorts only this buffer, not the whole set.

This is quite efficient, but still requires a full table scan.

This can be a problem, since the queries like that are often run frequently on heavily loaded sites (like for showing 10 random pages on social bookmarking systems), and full table scans will hamper performance significantly.

With new PosgreSQL 8.4 abilities to run recursive queries, this can be improved.

We can sample random values of the row ids and use an array to record previously selected values.

Let's create a sample table and see how can we imrove this query:
Read the rest of this entry »

Written by Quassnoi

July 18th, 2009 at 11:00 pm

Posted in PostgreSQL