PostgreSQL: Selecting records holding group-wise maximum at EXPLAIN EXTENDED

PostgreSQL: Selecting records holding group-wise maximum

Comments enabled. I *really* need your comment

Continuing the series on selecting records holding group-wise maximums:

How do I select the whole records, grouped on grouper and holding a group-wise maximum (or minimum) on other column?

In this article, I'll describe several ways to do this in PostgreSQL 8.4.

PostgreSQL 8.4 syntax is much richer than that of MySQL. The former can use the analytic functions, recursive CTE's and proprietary syntax extensions, all of which can be used for this task.

Let's create a sample table:

Table creation details

CREATE TABLE t_distinct (
id INT NOT NULL PRIMARY KEY,
orderer INT NOT NULL,
glow INT NOT NULL,
ghigh INT NOT NULL,
stuffing VARCHAR(200) NOT NULL
);

CREATE INDEX ix_distinct_glow_id ON t_distinct (glow, id);
CREATE INDEX ix_distinct_ghigh_id ON t_distinct (ghigh, id);
CREATE INDEX ix_distinct_glow_orderer_id ON t_distinct (glow, orderer, id);
CREATE INDEX ix_distinct_ghigh_orderer_id ON t_distinct (ghigh, orderer, id);

SELECT  SETSEED(0.20091126);

INSERT
INTO    t_distinct (id, orderer, glow, ghigh, stuffing)
SELECT  id, FLOOR(RANDOM() * 9) + 1,
(id - 1) % 10 + 1,
(id - 1) % 10000 + 1,
LPAD('', 200, '*')
FROM    generate_series(1, 1000000) id;

This table has 1,000,000 records:

id is the PRIMARY KEY
orderer is filled with random values from 1 to 10
glow is a low cardinality grouping field (10 distinct values)
ghigh is a high cardinality grouping field (10,000 distinct values)
stuffing is an asterisk-filled VARCHAR(200) column added to emulate payload of the actual tables

Analytic functions

PostgreSQL 8.4 supports analytic functions. These functions extend the aggregate abilities: they work on the groups rather than on the individual records, but return their values to each individual record instead of shrinking the set. For instance, ROW_NUMBER enumerates the records within the group according to the ordering condition, and DENSE_RANK enumerates distinct values of the ordering column (it assigns same number to the records with the same value of the ordering column).

Let's make a query to select the records holding the group-wise maximums of id. Since id is a PRIMARY KEY we don't have to worry about the ties.

Here's the query:

SELECT  id, orderer, glow, ghigh
FROM    (
SELECT  *, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn
FROM    t_distinct
) q
WHERE   rn = 1

View query details

id	orderer	glow	ghigh
1	8	1	1
2	2	2	2
3	2	3	3
4	9	4	4
5	1	5	5
6	4	6	6
7	6	7	7
8	6	8	8
9	7	9	9
10	2	10	10
10 rows fetched in 0.0507s (11.7804s)

Subquery Scan q  (cost=246866.84..279366.84 rows=5000 width=16)
  Filter: (q.rn = 1)
  ->  WindowAgg  (cost=246866.84..266866.84 rows=1000000 width=220)
        ->  Sort  (cost=246866.84..249366.84 rows=1000000 width=220)
              Sort Key: t_distinct.glow, t_distinct.id
              ->  Seq Scan on t_distinct  (cost=0.00..41250.00 rows=1000000 width=220)

This works, but is very inefficient (more than 12 seconds). PostgreSQL chooses the sorting in this case but it is not very good in sorting the tables with large rows.

This can be improved by making PostgreSQL to use the index which covers both columns and then join the id (which is also covered by the index):

SELECT  di.id, di.orderer, di.glow, di.ghigh
FROM    (
SELECT  id, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn
FROM    t_distinct d
) dd
JOIN    t_distinct di
ON      di.id = dd.id
WHERE   rn = 1

View query details

id	orderer	glow	ghigh
1	8	1	1
2	2	2	2
3	2	3	3
4	9	4	4
5	1	5	5
6	4	6	6
7	6	7	7
8	6	8	8
9	7	9	9
10	2	10	10
10 rows fetched in 0.0006s (3.9997s)

Nested Loop  (cost=140907.84..205920.84 rows=5000 width=16)
  ->  Subquery Scan dd  (cost=140907.84..173407.84 rows=5000 width=4)
        Filter: (dd.rn = 1)
        ->  WindowAgg  (cost=140907.84..160907.84 rows=1000000 width=8)
              ->  Sort  (cost=140907.84..143407.84 rows=1000000 width=8)
                    Sort Key: d.glow, d.id
                    ->  Seq Scan on t_distinct d  (cost=0.00..41250.00 rows=1000000 width=8)
  ->  Index Scan using t_distinct_pkey on t_distinct di  (cost=0.00..6.49 rows=1 width=16)
        Index Cond: (di.id = dd.id)

This is much faster (4 s) but there is still much space for improvement.

If we wish to order by orderer we need to define a method to resolve ties.

Using the same approach we can return all records with ties:

SELECT  COUNT(*), SUM(id)
FROM    (
SELECT  *, DENSE_RANK() OVER (PARTITION BY glow ORDER BY orderer) AS dr
FROM    t_distinct d
) dd
WHERE   dr = 1

View query details

count	sum
111058	55543096995
1 row fetched in 0.0078s (16.9997s)

Aggregate  (cost=279391.85..279391.86 rows=1 width=4)
  ->  Subquery Scan dd  (cost=246866.84..279366.84 rows=5000 width=4)
        Filter: (dd.dr = 1)
        ->  WindowAgg  (cost=246866.84..266866.84 rows=1000000 width=220)
              ->  Sort  (cost=246866.84..249366.84 rows=1000000 width=220)
                    Sort Key: d.glow, d.orderer
                    ->  Seq Scan on t_distinct d  (cost=0.00..41250.00 rows=1000000 width=220)

, or resolve ties by return the record with the maximum id among those holding the minimum value of the orderer:

SELECT  di.id, di.orderer, di.glow, di.ghigh
FROM    (
SELECT  id, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY orderer, id DESC) AS rn
FROM    t_distinct d
) dd
JOIN    t_distinct di
ON      di.id = dd.id
WHERE   rn = 1

View query details

id	orderer	glow	ghigh
999881	1	1	9881
999892	1	2	9892
999923	1	3	9923
999984	1	4	9984
999955	1	5	9955
999936	1	6	9936
999827	1	7	9827
999848	1	8	9848
999829	1	9	9829
999930	1	10	9930
10 rows fetched in 0.0006s (4.8593s)

Nested Loop  (cost=140907.84..208420.84 rows=5000 width=16)
  ->  Subquery Scan dd  (cost=140907.84..175907.84 rows=5000 width=4)
        Filter: (dd.rn = 1)
        ->  WindowAgg  (cost=140907.84..163407.84 rows=1000000 width=12)
              ->  Sort  (cost=140907.84..143407.84 rows=1000000 width=12)
                    Sort Key: d.glow, d.orderer, d.id
                    ->  Seq Scan on t_distinct d  (cost=0.00..41250.00 rows=1000000 width=12)
  ->  Index Scan using t_distinct_pkey on t_distinct di  (cost=0.00..6.49 rows=1 width=16)
        Index Cond: (di.id = dd.id)

As you can see, all these queries are elegant but rather inefficient.

Using DISTINCT ON

PostgreSQL implements another way to return the whole records holding group-wise maximums or minimums.

By using a special clause, DISTINCT ON, we can return records holding only the distinct values of the certain columns. For this to work correctly, one needs to define an ORDER BY condition in addition to DISTINCT ON, with the leading expressions being the same as those using in DISTINCT ON. This guarantees that all the records belonging to each group would be consecutive if not for the DISTINCT ON clause.

DISTINCT ON is applied after the ORDER BY condition. It just returns the first record from each group, skipping the others. This is very easy to do: return a record if the grouping expression changed from the previous row; don't return if it didn't.

This is quite similar to MySQL's extension for GROUP BY, but, unlike MySQL, this solution guarantees correct order and the fact that all values returned will be taken from a single record.

This query cannot be used to return all records with ties (since the values of the grouping column won't be distinct), but it will work if the ties are impossible (as in selecting a maximum id), or if a correct condition for resolving ties is provided.

Here's the query to return records holding MAX(id):

SELECT  DISTINCT ON (glow) id, orderer, glow, ghigh
FROM    t_distinct
ORDER BY
glow, id DESC

View query details

id	orderer	glow	ghigh
999991	5	1	9991
999992	3	2	9992
999993	6	3	9993
999994	5	4	9994
999995	4	5	9995
999996	8	6	9996
999997	3	7	9997
999998	2	8	9998
999999	8	9	9999
1000000	6	10	10000
10 rows fetched in 0.0006s (3.3593s)

Unique  (cost=140907.84..145907.84 rows=10 width=16)
  ->  Sort  (cost=140907.84..143407.84 rows=1000000 width=16)
        Sort Key: glow, id
        ->  Seq Scan on t_distinct  (cost=0.00..41250.00 rows=1000000 width=16)

And here's the one to return the MAX(id) within the MIN(orderer):

SELECT  DISTINCT ON (glow) id, orderer, glow, ghigh
FROM    t_distinct
ORDER BY
glow, orderer, id DESC

View query details

id	orderer	glow	ghigh
999881	1	1	9881
999892	1	2	9892
999923	1	3	9923
999984	1	4	9984
999955	1	5	9955
999936	1	6	9936
999827	1	7	9827
999848	1	8	9848
999829	1	9	9829
999930	1	10	9930
10 rows fetched in 0.0006s (3.9530s)

Unique  (cost=140907.84..145907.84 rows=10 width=16)
  ->  Sort  (cost=140907.84..143407.84 rows=1000000 width=16)
        Sort Key: glow, orderer, id
        ->  Seq Scan on t_distinct  (cost=0.00..41250.00 rows=1000000 width=16)

This is more efficient than the window function. However, both queries still take 4 seconds. This is almost 40 times as much as the same queries in MySQL , even without any improvements.

Unlike MySQL, PostgreSQL does not implement loose index scan which would allow to jump over the distinct index records. However, it can be emulated using recursive CTE's.

Recursive CTE's to emulate loose index scan

The main idea here is simple:

In the anchor part of the CTE take the lowest value of the key
In the recursive part of the CTE take the next value of the key by using > or < operators along with the ORDER BY and LIMIT 1

PostgreSQL's syntax allows compacting a whole table record into a single field (which can be exploded later). This will allow us to avoid joins by placing the whole recursive part into a subquery which will use the index efficiently.

Here's the query to return records holding group-wise MAX(id):

WITH    RECURSIVE rows AS
(
SELECT  d
FROM    (
SELECT  d
FROM    t_distinct d
ORDER BY
glow DESC, id DESC
LIMIT 1
) q
UNION ALL
SELECT  (
SELECT  di
FROM    t_distinct di
WHERE   di.glow &lt; (r.d).glow
                ORDER BY
                        di.glow DESC, di.id DESC
                LIMIT 1
                )
        FROM    rows r
        WHERE   d IS NOT NULL
        )
SELECT  (d).id, (d).orderer, (d).glow, (d).ghigh
FROM    rows
WHERE   d IS NOT NULL

id	orderer	glow	ghigh
1000000	6	10	10000
999999	8	9	9999
999998	2	8	9998
999997	3	7	9997
999996	8	6	9996
999995	4	5	9995
999994	5	4	9994
999993	6	3	9993
999992	3	2	9992
999991	5	1	9991
10 rows fetched in 0.0005s (0.0049s)

CTE Scan on rows  (cost=232.94..234.96 rows=100 width=32)
  Filter: (d IS NOT NULL)
  CTE rows
    ->  Recursive Union  (cost=0.00..232.94 rows=101 width=32)
          ->  Subquery Scan q  (cost=0.00..2.24 rows=1 width=32)
                ->  Limit  (cost=0.00..2.23 rows=1 width=40)
                      ->  Index Scan Backward using ix_distinct_glow_id on t_distinct d  (cost=0.00..2231423.39 rows=1000000 width=40)
          ->  WorkTable Scan on rows r  (cost=0.00..22.87 rows=10 width=32)
                Filter: (r.d IS NOT NULL)
                SubPlan 1
                  ->  Limit  (cost=0.00..2.27 rows=1 width=40)
                        ->  Index Scan Backward using ix_distinct_glow_id on t_distinct di  (cost=0.00..755578.45 rows=333333 width=40)
                              Index Cond: (glow < ($1).glow)

As you can see, this query takes only 5 ms, next to instant. This is because on each iteration step, the whole record can be returned in a single index seek for the first value of the key which is greater than the previous value.

If we wanted to resolve the ties with more complex conditions, the query would become a little more complex too.

Let's consider the query to resolve ties by selecting MAX(id) within the MIN(orderer), just like in the previous example.

The indexes we created order all columns in the same directions: (glow ASC, orderer ASC, id ASC). Of course, the whole index could be used as well if all directions were reversed: (glow DESC, orderer DESC, id DESC).

However, if only some of the directions are reversed, like in (orderer DESC, id ASC) (which is what we need here), the index cannot be used for ordering anymore.

The same problem was mentioned in one of the previous articles on selecting records holding group-wise maximums in MySQL. And this is the reason for the MAX(id) being less efficient than MIN(id) with a loose index scan (which is described in more details in the article aforementioned). However, MySQL deals with it automatically, while we need to implement this with our own hands.

To do this, we should need to use the same trick as we did in MySQL: select the MIN(orderer) and MAX(id) within this orderer in two different queries which would use two different index seeks, each in the appropriate direction.

Here's the query:

WITH    RECURSIVE groups AS
(
SELECT  d
FROM    (
SELECT  d
FROM    t_distinct d
ORDER BY
glow, orderer
LIMIT 1
) q
UNION ALL
SELECT  (
SELECT  di
FROM    t_distinct di
WHERE   di.glow &gt; (g.d).glow
ORDER BY
di.glow, di.orderer
LIMIT 1
)
FROM    groups g
WHERE   d IS NOT NULL
),
rows AS
(
SELECT  (
SELECT  di
FROM    t_distinct di
WHERE   di.glow = (g.d).glow
AND di.orderer = (g.d).orderer
ORDER BY
id DESC
LIMIT 1
) di
FROM    groups g
WHERE   d IS NOT NULL
)
SELECT  (di).id, (di).orderer, (di).glow, (di).ghigh
FROM    rows

id	orderer	glow	ghigh
999881	1	1	9881
999892	1	2	9892
999923	1	3	9923
999984	1	4	9984
999955	1	5	9955
999936	1	6	9936
999827	1	7	9827
999848	1	8	9848
999829	1	9	9829
999930	1	10	9930
10 rows fetched in 0.0005s (0.0058s)

CTE Scan on rows  (cost=588.47..590.47 rows=100 width=32)
  CTE groups
    ->  Recursive Union  (cost=0.00..243.79 rows=101 width=32)
          ->  Subquery Scan q  (cost=0.00..2.35 rows=1 width=32)
                ->  Limit  (cost=0.00..2.34 rows=1 width=40)
                      ->  Index Scan using ix_distinct_glow_orderer_id on t_distinct d  (cost=0.00..2343052.63 rows=1000000 width=40)
          ->  WorkTable Scan on groups g  (cost=0.00..23.94 rows=10 width=32)
                Filter: (g.d IS NOT NULL)
                SubPlan 1
                  ->  Limit  (cost=0.00..2.37 rows=1 width=40)
                        ->  Index Scan using ix_distinct_glow_orderer_id on t_distinct di  (cost=0.00..791389.47 rows=333333 width=40)
                              Index Cond: (glow > ($1).glow)
  CTE rows
    ->  CTE Scan on groups g  (cost=0.00..344.68 rows=100 width=32)
          Filter: (d IS NOT NULL)
          SubPlan 3
            ->  Limit  (cost=0.00..3.43 rows=1 width=36)
                  ->  Index Scan Backward using ix_distinct_glow_orderer_id on t_distinct di  (cost=0.00..38073.13 rows=11111 width=36)
                        Index Cond: ((glow = ($3).glow) AND (orderer = ($3).orderer))

We see both Index Scan and Index Scan Backward in the plan above. The first one finds the MIN(orderer), the second one finds the MAX(id) within the previously found value of the orderer.

Note that unlike MySQL, in PostgreSQL it's enough to use just a single ORDER BY id DESC condition in the subquery which selects the top id within the records with the lowest orderer. PostgreSQL's optimizer is smart enough to pick the correct index (that is the index on (glow, orderer, id)) to serve this query.

This query also takes only 5 ms.

Summary

Unlike MySQL, PostgreSQL implements several clean and documented ways to select the records holding group-wise maximums, including window functions and DISTINCT ON.

However to the lack of the loose index scan support by the PostgreSQL's optimizer and the less efficient usage of indexes in PostgreSQL, the queries using these function take too long.

To word around these problems and improve the queries against the low cardinality grouping conditions, a certain solution described in the article should be used.

This solution uses recursive CTE's to emulate loose index scan and is very efficient if the grouping columns have low cardinality.

To be continued.

Written by Quassnoi

November 26th, 2009 at 11:00 pm

Posted in PostgreSQL

EXPLAIN EXTENDED