EXPLAIN EXTENDED

How to create fast database queries

Shuffling rows: PostgreSQL

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Josh asks:

I am building a music application and need to create a playlist of arbitrary length from the tracks stored in the database.

This playlist should be shuffled and a track can repeat only after at least 10 other tracks had been played.

Is it possible to do this with a single SQL query or I need to create a cursor?

This is in PostgreSQL 8.4

PostgreSQL 8.4 is a wise choice, since it introduces some new features that ease this task.

To do this we just need to keep a running set that would hold the previous 10 tracks so that we could filter on them.

PostgreSQL 8.4 supports recursive CTE's that allow iterating the resultsets, and arrays that can be easily used to keep the set of 10 latest tracks.

Here's what we should do to build the playlist:

  1. We make a recursive CTE that would generate as many records as we need and just use LIMIT to limit the number
  2. The base part of the CTE is just a random record (fetched with ORDER BY RANDOM() LIMIT 1)
  3. The base part also defines the queue. This is an array which holds 10 latest records selected. It is initialized in the base part with the id of the random track just selected
  4. The recursive part of the CTE joins the previous record with the table, making sure that no record from the latest 10 will be selected on this step. To do this, we just use the array operator <@ (contained by)
  5. The recursive part adds newly selected record to the queue. The queue should be no more than 10 records long, that's why we apply array slicing operator to it ([1:10])

Let's create a sample table:

Table creation details

This table is quite simple: it just contains 1,000 tracks with generated names.

And here's the query:

WITH    RECURSIVE
shuffle AS
(
SELECT  *
FROM    (
SELECT  id, name, ARRAY[id] AS queue
FROM    t_track
ORDER BY
RANDOM()
LIMIT 1
) q
UNION ALL
SELECT  *
FROM    (
SELECT  t.id, t.name, (t.id || s.queue)[1:10]
FROM    shuffle s
JOIN    t_track t
ON      NOT ARRAY[t.id] &lt;@ s.queue
                ORDER BY
                        RANDOM()
                LIMIT 1
                ) q
        )
SELECT  id, name, queue::VARCHAR
FROM    shuffle
LIMIT 30

View query details

This query selects first 30 records but the LIMIT clause can be changed to select an arbitrary number of records (including that exceeding 1,000), since we don't apply any limits into the recursive part of the query.

Normally, the queue would be hidden but I left it just to illustrate what's going on. As you can see, the queue holds the id's of last 10 records.

The query runs for 120 ms which is quite fast but could be yet improved using approaches described in PostgreSQL 8.4: sampling random rows. However, this will make the query too hard to read and ORDER BY RANDOM() is just fine to demonstrates the principle.

Hope that helps.


I'm always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

October 6th, 2009 at 11:00 pm

Posted in PostgreSQL

Leave a Reply