PostgreSQL 8.4: sampling random rows at EXPLAIN EXTENDED

PostgreSQL 8.4: sampling random rows

On Jul 1, 2009, PostgreSQL 8.4 was released.

In this series of articles, I'd like to show how to reimplement some tasks I wrote about in the previous blog posts using new PostgreSQL features.

Other articles on new features of PostgreSQL 8.4:

Today, I'll show a way to sample random rows from a PRIMARY KEY preserved table.

Usually, if we need, say, 10 random rows from a table, we issue this query:

SELECT	*
FROM	t_random
ORDER BY
RANDOM()
LIMIT 10

PostgreSQL heavily optimizes this query, since it sees a LIMIT condition and does not sort all rows. Instead, it just keeps a running buffer which contains at most 10 rows with the least values or RANDOM calculated so far, and when a row small enough is met, it sorts only this buffer, not the whole set.

This is quite efficient, but still requires a full table scan.

This can be a problem, since the queries like that are often run frequently on heavily loaded sites (like for showing 10 random pages on social bookmarking systems), and full table scans will hamper performance significantly.

With new PosgreSQL 8.4 abilities to run recursive queries, this can be improved.

We can sample random values of the row ids and use an array to record previously selected values.

Let's create a sample table and see how can we imrove this query:

CREATE TABLE t_random (
id INT NOT NULL PRIMARY KEY,
value VARCHAR(20) NOT NULL,
stuffing VARCHAR(400) NOT NULL DEFAULT (RPAD('', 400, '*'))
);

INSERT
INTO    t_random (id, value)
SELECT  s * 13 + s % 5, 'Value ' || s
FROM    generate_series (1, 1000000) s;

This table has 1,000,000 rows, with id as PRIMARY KEY.

Values of id are not contiguous.

Field stuffing is a VARCHAR(400) filled with asterisks, and makes the table rows very large. This is to emulate real tables holding lots of values.

Let's see how the commonly used query performs:

SELECT  id, value
FROM    t_random
ORDER BY
RANDOM()
LIMIT 10

id	value
10117747	Value 778288
625263	Value 48097
613902	Value 47223
4446274	Value 342021
7942392	Value 610953
10371985	Value 797845
7885749	Value 606596
2525668	Value 194282
10247394	Value 788261
2164760	Value 166520
10 rows fetched in 0.0003s (2.8438s)

Limit  (cost=89665.40..89665.43 rows=10 width=16)
  ->  Sort  (cost=89665.40..92165.38 rows=999993 width=16)
        Sort Key: (random())
        ->  Seq Scan on t_random  (cost=0.00..68055.91 rows=999993 width=16)

It takes almost 3 seconds, which is too much for a loaded server.

Instead of scanning all table rows, we can assume that the id's are distributed evenly, and select random ids.

If we have a set of 10 random ids, we can get 10 rows in but 10 index scans, which will be very fast.

However, there are some complications. To overcome them, we should make our query in the following steps:

Determine the minimum and the maximum id
On each recursion step, do the following:
1. Create an backtrace array of previously selected ids. On the first step, this array will be empty
2. Select a random seed between minimum and maximum
3. Select first id greater than this seed, which is not contained in the backtrace array
4. If there is no such id (i. e. all ids greater than the seed had already been selected), then select first id, not contained in the backtrace array, starting from the beginning
Repeat recursion until the required number of random rows is selected, or steps 2.3 and 2.4 both fail (that is, all records had been selected)

We use two additional tricks here: backtrace array and table wraparound.

Since the ids are not contiguous, we cannot use equality condition. We need to use the random seed value only as a starting point, and select a first id that is greater than the seed. Additional checking against the backtrace array will make sure that the id we are selecting on the current step was not selected earlier.

If the seed used on a recursion step happens to be large enough, a situation can arise when there will be no id high enough which had not been selected earlier.

That's why we will use a UNION ALL of two queries with a LIMIT 1 applied to the results of the UNION ALL: first query will select an unused id greater than the seed, the second one will select the first unused id starting from the minimal one. The result will be exactly as if the table were cycled. This will make the result distribution fair.

If neither of the queries above returned an id, it means that all ids had been already selected and added to the backtrace array. We should stop recursion on this point.

With all above in mind, let's create the query:

WITH  RECURSIVE q AS (
SELECT  r, minid, maxid, ARRAY[(r).id] AS bt, 1 AS cnt
FROM    (
SELECT  rm.*,
(
SELECT  r
FROM    t_random r
WHERE   id &gt;= (minid + FLOOR((maxid - minid) * RANDOM()))::INTEGER
ORDER BY
id
LIMIT 1
) r
FROM    (
SELECT  MIN(id) AS minid, MAX(id) AS maxid
FROM    t_random
) rm
) ro
UNION ALL
SELECT  ri, minid, maxid, bt || (ri).id, cnt + 1
FROM    (
SELECT  q.*,
(
SELECT  ri
FROM    (
SELECT  ri
FROM    t_random ri
WHERE   id &gt;= (minid + FLOOR((maxid - minid) * rnd))::INTEGER
AND NOT bt @&gt; ARRAY[id]
ORDER BY
id
LIMIT 1
) r
UNION ALL
SELECT  ri
FROM    (
SELECT  ri
FROM    t_random ri
WHERE   NOT bt @&gt; ARRAY[id]
LIMIT 1
) r
LIMIT 1
) ri
FROM    (
SELECT  q.*, RANDOM() AS rnd
FROM    q
) q
) qo
WHERE   cnt &lt; 10 AND ri IS NOT NULL
        )
SELECT  (r).id, (r).value
FROM    q

id	value
35221	Value 2709
1718809	Value 132216
7990752	Value 614673
7362304	Value 566331
2432226	Value 187094
617890	Value 47530
8521361	Value 655489
5632510	Value 433270
12426649	Value 955896
10251424	Value 788571
10 rows fetched in 0.0004s (0.0159s)

CTE Scan on q  (cost=14.54..15.16 rows=31 width=32)
  CTE q
    ->  Recursive Union  (cost=0.16..14.54 rows=31 width=52)
          ->  Subquery Scan rm  (cost=0.16..0.81 rows=1 width=8)
                ->  Result  (cost=0.16..0.17 rows=1 width=0)
                      InitPlan 3 (returns $3)
                        ->  Limit  (cost=0.00..0.08 rows=1 width=4)
                              ->  Index Scan using t_random_pkey on t_random  (cost=0.00..81539.25 rows=999993 width=4)
                                    Filter: (id IS NOT NULL)
                      InitPlan 4 (returns $4)
                        ->  Limit  (cost=0.00..0.08 rows=1 width=4)
                              ->  Index Scan Backward using t_random_pkey on t_random  (cost=0.00..81539.25 rows=999993 width=4)
                                    Filter: (id IS NOT NULL)
                SubPlan 1
                  ->  Limit  (cost=0.00..0.31 rows=1 width=36)
                        ->  Index Scan using t_random_pkey on t_random r  (cost=0.00..104039.10 rows=333331 width=36)
                              Filter: (id >= ((($1)::double precision + floor(((($2 - $1))::double precision * random()))))::integer)
                SubPlan 2
                  ->  Limit  (cost=0.00..0.31 rows=1 width=36)
                        ->  Index Scan using t_random_pkey on t_random r  (cost=0.00..104039.10 rows=333331 width=36)
                              Filter: (id >= ((($1)::double precision + floor(((($2 - $1))::double precision * random()))))::integer)
          ->  Subquery Scan q  (cost=0.00..1.31 rows=3 width=52)
                Filter: ((SubPlan 7) IS NOT NULL)
                ->  WorkTable Scan on q  (cost=0.00..0.23 rows=3 width=76)
                      Filter: (cnt < 10)
                SubPlan 5
                  ->  Limit  (cost=0.02..0.11 rows=1 width=32)
                        ->  Append  (cost=0.02..0.21 rows=2 width=32)
                              ->  Subquery Scan r  (cost=0.02..0.11 rows=1 width=32)
                                    ->  Limit  (cost=0.02..0.10 rows=1 width=36)
                                          ->  Index Scan using t_random_pkey on t_random ri  (cost=0.02..28849.00 rows=332998 width=36)
                                                Index Cond: (id >= ((($2)::double precision + floor(((($6 - $2))::double precision * $7))))::integer)
                                                Filter: (NOT ($5 @> ARRAY[id]))
                              ->  Limit  (cost=0.00..0.07 rows=1 width=32)
                                    ->  Seq Scan on t_random ri  (cost=0.00..68055.91 rows=998993 width=32)
                                          Filter: (NOT ($5 @> ARRAY[id]))
                SubPlan 6
                  ->  Limit  (cost=0.02..0.11 rows=1 width=32)
                        ->  Append  (cost=0.02..0.21 rows=2 width=32)
                              ->  Subquery Scan r  (cost=0.02..0.11 rows=1 width=32)
                                    ->  Limit  (cost=0.02..0.10 rows=1 width=36)
                                          ->  Index Scan using t_random_pkey on t_random ri  (cost=0.02..28849.00 rows=332998 width=36)
                                                Index Cond: (id >= ((($2)::double precision + floor(((($6 - $2))::double precision * $7))))::integer)
                                                Filter: (NOT ($5 @> ARRAY[id]))
                              ->  Limit  (cost=0.00..0.07 rows=1 width=32)
                                    ->  Seq Scan on t_random ri  (cost=0.00..68055.91 rows=998993 width=32)
                                          Filter: (NOT ($5 @> ARRAY[id]))
                SubPlan 7
                  ->  Limit  (cost=0.02..0.11 rows=1 width=32)
                        ->  Append  (cost=0.02..0.21 rows=2 width=32)
                              ->  Subquery Scan r  (cost=0.02..0.11 rows=1 width=32)
                                    ->  Limit  (cost=0.02..0.10 rows=1 width=36)
                                          ->  Index Scan using t_random_pkey on t_random ri  (cost=0.02..28849.00 rows=332998 width=36)
                                                Index Cond: (id >= ((($2)::double precision + floor(((($6 - $2))::double precision * $7))))::integer)
                                                Filter: (NOT ($5 @> ARRAY[id]))
                              ->  Limit  (cost=0.00..0.07 rows=1 width=32)
                                    ->  Seq Scan on t_random ri  (cost=0.00..68055.91 rows=998993 width=32)
                                          Filter: (NOT ($5 @> ARRAY[id]))

As you can see in the query plan, all subqueries use Index Scans with Limit, which most probably takes but one index lookup per subquery (may be more if the backtrace array is hit, but this is very improbable on a large table).

This query completes in only 15 ms, which is almost instant.

This solution is guaranteed to return all requested rows (or all table rows) and not to fail if there are less rows in the table than requested.

However, the result fairness is affected by the id distribution.

Since the algorithm returns the first id after a randomly chosen seed, the larger is the gap between two consecutive ids is, the more likely that the second id will be returned.

Written by Quassnoi

July 18th, 2009 at 11:00 pm

Posted in PostgreSQL

2 Responses to 'PostgreSQL 8.4: sampling random rows'

Subscribe to comments with RSS

Hello,
I am developing a django website and have the need to get random results. Using postgre sql 9.1.
I found Your blog and seems very interesting, but I am not enough expert to understand it. I am dubious about implementing something I do not understand:)
Would You mind explaining a bit (what block is doing what)?
In my case I have companies table and I let the client filter by name and location. And I would like to return random 5 results. Also a company could have more locations, so in my current sql I have DISTINCT.
Could this Your example be adopted to this case?
Anyway even if not I would like to understand the query.
Thanks
Nenad

Čikić Nenad

26 Oct 13 at 08:16
This post actually got me on the fast track for some random generation I needed.

was very easy to adopt the code.

one comment though.
instead of doing the cnt < 10 inside the WITH block – try this:

WHERE ri IS NOT NULL
)

SELECT (r).id, (r).value
FROM q
LIMIT 10;

This way, the limit is more easily controllable and have the same effect as far as I know.

Shahar

31 Mar 14 at 10:12

EXPLAIN EXTENDED