PostgreSQL: Selecting records holding group-wise maximum
Comments enabled. I *really* need your comment
Continuing the series on selecting records holding group-wise maximums:
How do I select the whole records, grouped on
grouper
and holding a group-wise maximum (or minimum) on other column?
In this article, I'll describe several ways to do this in PostgreSQL 8.4.
PostgreSQL 8.4 syntax is much richer than that of MySQL. The former can use the analytic functions, recursive CTE's and proprietary syntax extensions, all of which can be used for this task.
Let's create a sample table:
Table creation details
This table has 1,000,000 records:
id
is thePRIMARY KEY
orderer
is filled with random values from 1 to 10glow
is a low cardinality grouping field (10 distinct values)ghigh
is a high cardinality grouping field (10,000 distinct values)stuffing
is an asterisk-filledVARCHAR(200)
column added to emulate payload of the actual tables
Analytic functions
PostgreSQL 8.4 supports analytic functions. These functions extend the aggregate abilities: they work on the groups rather than on the individual records, but return their values to each individual record instead of shrinking the set. For instance, ROW_NUMBER
enumerates the records within the group according to the ordering condition, and DENSE_RANK
enumerates distinct values of the ordering column (it assigns same number to the records with the same value of the ordering column).
Let's make a query to select the records holding the group-wise maximums of id
. Since id
is a PRIMARY KEY
we don't have to worry about the ties.
Here's the query:
SELECT id, orderer, glow, ghigh FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn FROM t_distinct ) q WHERE rn = 1
This works, but is very inefficient (more than 12 seconds). PostgreSQL chooses the sorting in this case but it is not very good in sorting the tables with large rows.
This can be improved by making PostgreSQL to use the index which covers both columns and then join the id
(which is also covered by the index):
SELECT di.id, di.orderer, di.glow, di.ghigh FROM ( SELECT id, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn FROM t_distinct d ) dd JOIN t_distinct di ON di.id = dd.id WHERE rn = 1
This is much faster (4 s) but there is still much space for improvement.
If we wish to order by orderer
we need to define a method to resolve ties.
Using the same approach we can return all records with ties:
SELECT COUNT(*), SUM(id) FROM ( SELECT *, DENSE_RANK() OVER (PARTITION BY glow ORDER BY orderer) AS dr FROM t_distinct d ) dd WHERE dr = 1
, or resolve ties by return the record with the maximum id
among those holding the minimum value of the orderer
:
SELECT di.id, di.orderer, di.glow, di.ghigh FROM ( SELECT id, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY orderer, id DESC) AS rn FROM t_distinct d ) dd JOIN t_distinct di ON di.id = dd.id WHERE rn = 1
As you can see, all these queries are elegant but rather inefficient.
Using DISTINCT ON
PostgreSQL implements another way to return the whole records holding group-wise maximums or minimums.
By using a special clause, DISTINCT ON
, we can return records holding only the distinct values of the certain columns. For this to work correctly, one needs to define an ORDER BY
condition in addition to DISTINCT ON
, with the leading expressions being the same as those using in DISTINCT ON
. This guarantees that all the records belonging to each group would be consecutive if not for the DISTINCT ON
clause.
DISTINCT ON
is applied after the ORDER BY
condition. It just returns the first record from each group, skipping the others. This is very easy to do: return a record if the grouping expression changed from the previous row; don't return if it didn't.
This is quite similar to MySQL's extension for GROUP BY
, but, unlike MySQL, this solution guarantees correct order and the fact that all values returned will be taken from a single record.
This query cannot be used to return all records with ties (since the values of the grouping column won't be distinct), but it will work if the ties are impossible (as in selecting a maximum id
), or if a correct condition for resolving ties is provided.
Here's the query to return records holding MAX(id)
:
SELECT DISTINCT ON (glow) id, orderer, glow, ghigh FROM t_distinct ORDER BY glow, id DESC
And here's the one to return the MAX(id)
within the MIN(orderer)
:
SELECT DISTINCT ON (glow) id, orderer, glow, ghigh FROM t_distinct ORDER BY glow, orderer, id DESC
This is more efficient than the window function. However, both queries still take 4 seconds. This is almost 40 times as much as the same queries in MySQL , even without any improvements.
Unlike MySQL, PostgreSQL does not implement loose index scan which would allow to jump over the distinct index records. However, it can be emulated using recursive CTE's.
Recursive CTE's to emulate loose index scan
The main idea here is simple:
- In the anchor part of the CTE take the lowest value of the key
- In the recursive part of the CTE take the next value of the key by using
>
or<
operators along with theORDER BY
andLIMIT 1
PostgreSQL's syntax allows compacting a whole table record into a single field (which can be exploded later). This will allow us to avoid joins by placing the whole recursive part into a subquery which will use the index efficiently.
Here's the query to return records holding group-wise MAX(id)
:
WITH RECURSIVE rows AS ( SELECT d FROM ( SELECT d FROM t_distinct d ORDER BY glow DESC, id DESC LIMIT 1 ) q UNION ALL SELECT ( SELECT di FROM t_distinct di WHERE di.glow < (r.d).glow ORDER BY di.glow DESC, di.id DESC LIMIT 1 ) FROM rows r WHERE d IS NOT NULL ) SELECT (d).id, (d).orderer, (d).glow, (d).ghigh FROM rows WHERE d IS NOT NULL
id | orderer | glow | ghigh | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1000000 | 6 | 10 | 10000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999999 | 8 | 9 | 9999 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999998 | 2 | 8 | 9998 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999997 | 3 | 7 | 9997 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999996 | 8 | 6 | 9996 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999995 | 4 | 5 | 9995 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999994 | 5 | 4 | 9994 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999993 | 6 | 3 | 9993 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999992 | 3 | 2 | 9992 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999991 | 5 | 1 | 9991 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0005s (0.0049s) |
CTE Scan on rows (cost=232.94..234.96 rows=100 width=32) Filter: (d IS NOT NULL) CTE rows -> Recursive Union (cost=0.00..232.94 rows=101 width=32) -> Subquery Scan q (cost=0.00..2.24 rows=1 width=32) -> Limit (cost=0.00..2.23 rows=1 width=40) -> Index Scan Backward using ix_distinct_glow_id on t_distinct d (cost=0.00..2231423.39 rows=1000000 width=40) -> WorkTable Scan on rows r (cost=0.00..22.87 rows=10 width=32) Filter: (r.d IS NOT NULL) SubPlan 1 -> Limit (cost=0.00..2.27 rows=1 width=40) -> Index Scan Backward using ix_distinct_glow_id on t_distinct di (cost=0.00..755578.45 rows=333333 width=40) Index Cond: (glow < ($1).glow)
As you can see, this query takes only 5 ms, next to instant. This is because on each iteration step, the whole record can be returned in a single index seek for the first value of the key which is greater than the previous value.
If we wanted to resolve the ties with more complex conditions, the query would become a little more complex too.
Let's consider the query to resolve ties by selecting MAX(id)
within the MIN(orderer)
, just like in the previous example.
The indexes we created order all columns in the same directions: (glow ASC, orderer ASC, id ASC)
. Of course, the whole index could be used as well if all directions were reversed: (glow DESC, orderer DESC, id DESC)
.
However, if only some of the directions are reversed, like in (orderer DESC, id ASC)
(which is what we need here), the index cannot be used for ordering anymore.
The same problem was mentioned in one of the previous articles on selecting records holding group-wise maximums in MySQL. And this is the reason for the MAX(id)
being less efficient than MIN(id)
with a loose index scan (which is described in more details in the article aforementioned). However, MySQL deals with it automatically, while we need to implement this with our own hands.
To do this, we should need to use the same trick as we did in MySQL: select the MIN(orderer)
and MAX(id)
within this orderer
in two different queries which would use two different index seeks, each in the appropriate direction.
Here's the query:
WITH RECURSIVE groups AS ( SELECT d FROM ( SELECT d FROM t_distinct d ORDER BY glow, orderer LIMIT 1 ) q UNION ALL SELECT ( SELECT di FROM t_distinct di WHERE di.glow > (g.d).glow ORDER BY di.glow, di.orderer LIMIT 1 ) FROM groups g WHERE d IS NOT NULL ), rows AS ( SELECT ( SELECT di FROM t_distinct di WHERE di.glow = (g.d).glow AND di.orderer = (g.d).orderer ORDER BY id DESC LIMIT 1 ) di FROM groups g WHERE d IS NOT NULL ) SELECT (di).id, (di).orderer, (di).glow, (di).ghigh FROM rows
id | orderer | glow | ghigh | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
999881 | 1 | 1 | 9881 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999892 | 1 | 2 | 9892 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999923 | 1 | 3 | 9923 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999984 | 1 | 4 | 9984 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999955 | 1 | 5 | 9955 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999936 | 1 | 6 | 9936 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999827 | 1 | 7 | 9827 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999848 | 1 | 8 | 9848 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999829 | 1 | 9 | 9829 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
999930 | 1 | 10 | 9930 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0005s (0.0058s) |
CTE Scan on rows (cost=588.47..590.47 rows=100 width=32) CTE groups -> Recursive Union (cost=0.00..243.79 rows=101 width=32) -> Subquery Scan q (cost=0.00..2.35 rows=1 width=32) -> Limit (cost=0.00..2.34 rows=1 width=40) -> Index Scan using ix_distinct_glow_orderer_id on t_distinct d (cost=0.00..2343052.63 rows=1000000 width=40) -> WorkTable Scan on groups g (cost=0.00..23.94 rows=10 width=32) Filter: (g.d IS NOT NULL) SubPlan 1 -> Limit (cost=0.00..2.37 rows=1 width=40) -> Index Scan using ix_distinct_glow_orderer_id on t_distinct di (cost=0.00..791389.47 rows=333333 width=40) Index Cond: (glow > ($1).glow) CTE rows -> CTE Scan on groups g (cost=0.00..344.68 rows=100 width=32) Filter: (d IS NOT NULL) SubPlan 3 -> Limit (cost=0.00..3.43 rows=1 width=36) -> Index Scan Backward using ix_distinct_glow_orderer_id on t_distinct di (cost=0.00..38073.13 rows=11111 width=36) Index Cond: ((glow = ($3).glow) AND (orderer = ($3).orderer))
We see both Index Scan
and Index Scan Backward
in the plan above. The first one finds the MIN(orderer)
, the second one finds the MAX(id)
within the previously found value of the orderer
.
Note that unlike MySQL, in PostgreSQL it's enough to use just a single ORDER BY id DESC
condition in the subquery which selects the top id
within the records with the lowest orderer
. PostgreSQL's optimizer is smart enough to pick the correct index (that is the index on (glow, orderer, id)
) to serve this query.
This query also takes only 5 ms.
Summary
Unlike MySQL, PostgreSQL implements several clean and documented ways to select the records holding group-wise maximums, including window functions and DISTINCT ON
.
However to the lack of the loose index scan support by the PostgreSQL's optimizer and the less efficient usage of indexes in PostgreSQL, the queries using these function take too long.
To word around these problems and improve the queries against the low cardinality grouping conditions, a certain solution described in the article should be used.
This solution uses recursive CTE's to emulate loose index scan and is very efficient if the grouping columns have low cardinality.
To be continued.