EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Things SQL needs: sargability of monotonic functions. You're welcome to read and comment on it.

Archive for July 23rd, 2009

PostgreSQL: last 12 weeks of a year

From Stack Overflow:

I have a query like this:

SELECT  EXTRACT(WEEK FROM j.updated_at) as "week",  count(j.id)
FROM    jobs
WHERE   EXTRACT(YEAR FROM j.updated_at)=2009
GROUP BY
        EXTRACT(WEEK FROM j.updated_at)
ORDER BY
        week

, which works fine, but I only want to show the last 12 weeks.

LIMIT 12 works, but only gives me the first twelve and I need the order to be in sequential week order (i. e. not reversed) for charting purposes.

To select last 12 rows in ascending order it’s enough to select first 12 rows in descending order and resort them in a subquery:

SELECT  *
FROM    (
        SELECT  EXTRACT(week FROM updated_at) AS week, COUNT(*) AS cnt
        FROM    jobs
        WHERE   EXTRACT(year FROM updated_at) = 2008
        GROUP BY
                week
        ORDER BY
                week DESC
        LIMIT 12
        ) q
ORDER BY
        week

However, this is rather inefficient. This query selects and aggregate all the year data just to fetch 12 last weeks.

More than that, EXTRACT(year FROM updated) is not a sargable predicate, and all table rows (or index rows) need to be examined.

On a sample table of 1,000,000 rows, this query runs for more that 3.5 seconds:

SELECT  *
FROM    (
        SELECT  EXTRACT(week FROM updated) AS week, COUNT(*) AS cnt
        FROM    t_week
        WHERE   EXTRACT(year FROM updated) = 2008
        GROUP BY
                week
        ORDER BY
                week DESC
        LIMIT 12
        ) q
ORDER BY
        week
week cnt
41 3584
42 3467
43 3499
44 3535
45 3511
46 3621
47 3529
48 3500
49 3415
50 3536
51 3504
52 3479
12 rows fetched in 0.0004s (3.5843s)
Sort  (cost=58517.44..58517.47 rows=11 width=16)
  Sort Key: (date_part('week'::text, t_week.updated))
  ->  Limit  (cost=58517.11..58517.14 rows=11 width=8)
        ->  Sort  (cost=58517.11..58517.14 rows=11 width=8)
              Sort Key: (date_part('week'::text, t_week.updated))
              ->  HashAggregate  (cost=58516.75..58516.92 rows=11 width=8)
                    ->  Seq Scan on t_week  (cost=0.00..58491.75 rows=5000 width=8)
                          Filter: (date_part('year'::text, updated) = 2008::double precision)

However, this query can be easily improved. For each year, we can easily calculate the beginning and the end of each of last 12 weeks, and use these values in a more index-friendly query.

Let’s create a sample table and see how to do this:

Read the rest of this entry »

Written by Quassnoi

July 23rd, 2009 at 11:00 pm

Posted in PostgreSQL