Shuffling rows: PostgreSQL
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Josh asks:
I am building a music application and need to create a playlist of arbitrary length from the tracks stored in the database.
This playlist should be shuffled and a track can repeat only after at least 10 other tracks had been played.
Is it possible to do this with a single SQL query or I need to create a cursor?
This is in PostgreSQL 8.4
PostgreSQL 8.4 is a wise choice, since it introduces some new features that ease this task.
To do this we just need to keep a running set that would hold the previous 10 tracks so that we could filter on them.
PostgreSQL 8.4 supports recursive CTE's that allow iterating the resultsets, and arrays that can be easily used to keep the set of 10 latest tracks.
Here's what we should do to build the playlist:
- We make a recursive CTE that would generate as many records as we need and just use LIMIT to limit the number
- The base part of the CTE is just a random record (fetched with
ORDER BY RANDOM() LIMIT 1
) - The base part also defines the queue. This is an array which holds 10 latest records selected. It is initialized in the base part with the
id
of the random track just selected - The recursive part of the CTE joins the previous record with the table, making sure that no record from the latest 10 will be selected on this step. To do this, we just use the array operator
<@
(contained by) - The recursive part adds newly selected record to the queue. The queue should be no more than 10 records long, that's why we apply array slicing operator to it (
[1:10]
)
Let's create a sample table:
Table creation details
This table is quite simple: it just contains 1,000 tracks with generated names.
And here's the query:
WITH RECURSIVE shuffle AS ( SELECT * FROM ( SELECT id, name, ARRAY[id] AS queue FROM t_track ORDER BY RANDOM() LIMIT 1 ) q UNION ALL SELECT * FROM ( SELECT t.id, t.name, (t.id || s.queue)[1:10] FROM shuffle s JOIN t_track t ON NOT ARRAY[t.id] <@ s.queue ORDER BY RANDOM() LIMIT 1 ) q ) SELECT id, name, queue::VARCHAR FROM shuffle LIMIT 30
This query selects first 30 records but the LIMIT
clause can be changed to select an arbitrary number of records (including that exceeding 1,000), since we don't apply any limits into the recursive part of the query.
Normally, the queue
would be hidden but I left it just to illustrate what's going on. As you can see, the queue
holds the id
's of last 10 records.
The query runs for 120 ms which is quite fast but could be yet improved using approaches described in PostgreSQL 8.4: sampling random rows. However, this will make the query too hard to read and ORDER BY RANDOM()
is just fine to demonstrates the principle.
Hope that helps.
I'm always glad to answer the questions regarding database queries.