EXPLAIN EXTENDED

How to create fast database queries

Archive for July 21st, 2009

PostgreSQL: round-robin order

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a bunch of URLs stored in a table waiting to be scraped by a script.

However, many of those URLs are from the same site. I would like to return those URLs in a site-friendly order (that is, try to avoid two URLs from the same site in a row) so I won't be accidentally blocked by making too many HTTP requests in a short time.

The database layout is something like this:

create table urls (
site varchar, -- holds e.g. www.example.com or stockoverflow.com
url varchar unique
);

Example result:

SELECT  url
FROM    urls
ORDER BY
        mysterious_round_robin_function(site)
url
http://www.example.com/some/file
http://stackoverflow.com/questions/ask
http://use.perl.org/
http://www.example.com/some/other/file
http://stackoverflow.com/tags

To solve this task, we just need to assign a number to an URL within each site, and order by this number first, then by site, like this:

number site url
1 example.com http://www.example.com/some/file
1 stackoverflow.com http://stackoverflow.com/questions/ask
1 use.perl.org http://use.perl.org/
2 example.com http://www.example.com/some/other/file
2 stackoverflow.com http://stackoverflow.com/tags

Good news: in new PostgreSQL 8.4, we just can use a window function to do this:

SELECT  url
FROM    urls
ORDER BY
ROW_NUMBER() OVER (PARTITION BY site ORDER BY url), site
url
a.example.com/url104
b.example.com/url1
c.example.com/url106
y.example.com/url102
z.example.com/url103
a.example.com/url130
b.example.com/url105
c.example.com/url132
l.example.com/url999
m.example.com/url974
1000 rows fetched in 0.0169s (0.0178s)

Yes, that simple.

Bad news: in PostgreSQL 8.3 and below, we still need to hack.

Let's create a sample table and see how:

Read the rest of this entry »

Written by Quassnoi

July 21st, 2009 at 11:00 pm

Posted in PostgreSQL