Oracle: Selecting records holding group-wise maximum
It was quite a time since I posted my last article, and I have a bunch of unanswered questions that people asked me.
Sorry for not posting and answering for so long, had a few urgent things to do, and hope these breaks won't be so long in the future :)
Now, continuing the series on selecting records holding group-wise maximums:
How do I select the whole records, grouped on
grouperand holding a group-wise maximum (or minimum) on other column?
Finally, Oracle.
We will create the very same table as we did for the previous systems:
Table creation details
There are 1,000,000 records with the following fields:
idis thePRIMARY KEYordereris filled with random values from 1 to 10glowis a low cardinality grouping field (10 distinct values)ghighis a high cardinality grouping field (10,000 distinct values)stuffingis an asterisk-filledVARCHAR(200)column added to emulate payload of the actual tables
Analytic functions
Of course Oracle support analytic functions just like SQL Server and PostgreSQL (and actually, that's SQL Server and PostgreSQL that support analytics functions like Oracle).
To select records holding the group-wise maximums, we can apply the very same ROW_NUMBER and DENSE_RANK solutions we used for SQL Server and PostgreSQL before:
SELECT id, orderer, glow, ghigh FROM ( SELECT d.*, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn FROM t_distinct d ) q WHERE rn = 1
SELECT COUNT(*) AS cnt, SUM(LENGTH(stuffing)) AS psum FROM ( SELECT d.*, DENSE_RANK() OVER (PARTITION BY glow ORDER BY orderer) AS dr FROM t_distinct d ) dd WHERE dr = 1
This takes a little more than 6 seconds.
CONNECT BY and subqueries
Unfortunately in Oracle we cannot use the decisions we used to improve the query time in the other systems.
Oracle does not support loose index scan on composite indexes like MySQL.
It does support a similar solution, INDEX SKIP SCAN, but can use it only for the queries that search for the exact value of the orderer, not for MIN or MAX.
Oracle does not support CROSS APPLY so we cannot apply a multirecord or a multicolumn query to each result of the rowset.
More than that: Oracle does not event support deep correlation of the subqueries. You cannot nest a reference to the outer table in the subquery, and using ROW_NUMBER or ROWNUM to filter the first result requires nesting. So making an analog of MySQL's LIMIT in a subquery won't work either.
Oracle 10g (which is what I'm writing about here) also does not support recursive CTE's, so it's not possible to use this method too.
However, there still is a way to improve the query.
Instead of jumping over the index values to build the list of distinct groupers, we can build a whole list of the possible groupers and join it with the original table.
The possible
groupers are or course those between MIN and MAX.
To build such a list we can use a CONNECT BY query over the dual table, (which is a standard way to generate random rowsets in Oracle), and just take MIN(glow) and MAX(glow) as the start and stop conditions.
We then can find the MIN(orderer) or MIN(id) for each grouper from the generated list and join the table back on this value. This will use an efficient INDEX RANGE SCAN (MIN / MAX). There can be gaps: not all groupers will be present in the list, but this is not a problem, since the subquery will return a NULL and the join will just yield no rows.
Here's the query to find the records holding the MIN(id):
WITH q AS
(
SELECT level - 1 +
(
SELECT MIN(glow)
FROM t_distinct
) AS d
FROM dual
CONNECT BY
level <=
(
SELECT MAX(glow)
FROM t_distinct
) -
(
SELECT MIN(glow)
FROM t_distinct
) + 1
)
SELECT d.id, d.orderer, d.glow, d.glow
FROM q
JOIN t_distinct d
ON d.id =
(
SELECT MAX(id)
FROM t_distinct di
WHERE di.glow = q.d
)
| ID | ORDERER | GLOW | GLOW | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 999991 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999992 | 6 | 2 | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999993 | 8 | 3 | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999994 | 5 | 4 | 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999995 | 10 | 5 | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999996 | 9 | 6 | 6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999997 | 4 | 7 | 7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999998 | 3 | 8 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999999 | 1 | 9 | 9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1000000 | 4 | 10 | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 rows fetched in 0.0005s (0.0009s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
NESTED LOOPS
VIEW
CONNECT BY WITHOUT FILTERING
FAST DUAL
TABLE ACCESS BY INDEX ROWID, 20091216_groupwise.T_DISTINCT
INDEX UNIQUE SCAN, 20091216_groupwise.PK_DISTINCT_ID
SORT AGGREGATE
FIRST ROW
INDEX RANGE SCAN (MIN/MAX), 20091216_groupwise.IX_DISTINCT_GLOW_ID
, and this is a query to find the records holding MAX(id) within the MIN(orderer):
WITH q AS
(
SELECT level - 1 +
(
SELECT MIN(glow)
FROM t_distinct
) AS d
FROM dual
CONNECT BY
level <=
(
SELECT MAX(glow)
FROM t_distinct
) -
(
SELECT MIN(glow)
FROM t_distinct
) + 1
)
SELECT d.id, d.orderer, d.glow, d.glow
FROM (
SELECT d,
(
SELECT MIN(orderer)
FROM t_distinct d1
WHERE d1.glow = q.d
) AS mo
FROM q
) q2
JOIN t_distinct d
ON d.id =
(
SELECT MAX(id)
FROM t_distinct d2
WHERE d2.glow = q2.d
AND d2.orderer = q2.mo
)
| ID | ORDERER | GLOW | GLOW | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 999991 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999852 | 1 | 2 | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999543 | 1 | 3 | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999974 | 1 | 4 | 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999825 | 1 | 5 | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999816 | 1 | 6 | 6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999707 | 1 | 7 | 7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999818 | 1 | 8 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999999 | 1 | 9 | 9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 999960 | 1 | 10 | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 rows fetched in 0.0005s (0.0010s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
NESTED LOOPS
VIEW
VIEW
CONNECT BY WITHOUT FILTERING
FAST DUAL
TABLE ACCESS BY INDEX ROWID, 20091216_groupwise.T_DISTINCT
INDEX UNIQUE SCAN, 20091216_groupwise.PK_DISTINCT_ID
SORT AGGREGATE
FIRST ROW
INDEX RANGE SCAN (MIN/MAX), 20091216_groupwise.IX_DISTINCT_GLOW_ORDERER_ID
The latter query is a little bit more complex since we need two subqueries here: one to find the MIN(orderer), another one to find the MAX(id). The principle is the same, however.
Both subqueries are very fast and complete within 1 millisecond (read instantly).
This query depends not only on the selectivity of the grouper but also on the actual values of the bounds (the higher is the difference between MIN and MAX on the groupers, the less efficient the query is).
For this solution to work, the groupers should have a well-ordered datatype. This means you can build a list of all possible values between MIN and MAX. Integers and dates (without time portion) are well-ordered; real numbers, strings and timestamps are not.
For this solution to be efficient, the groupers should be well-ordered, dense and have low cardinality. Though these limitations can be quite strong, the solutions is still extremely useful for very many many real-world scenarios which groups by integers (like categories in the blogging system or department codes in an accounting application); or dates without time portion (like sales reports).
Subscribe in a reader
Excellent article!
There is a way to overcome the need to generate the range for grouping column.
Let’s say we add column glow_txt to t_distinct_table. Column is generated as following: ‘A’ || to_char(glow);
And let’s build another index:
CREATE INDEX ix_distinct_glow_txt_id ON t_distinct (glow_txt, id)
/
Now we can determine distinct grouping values with following query:
with T (glow_txt) as
(select /*+ distinct_glow_txt_minmax */
min (glow_txt) glow_txt from T_distinct
union all
select (select min (T1.glow_txt)
from T_distinct T1
where T1.glow_txt > T.glow_txt)
from T
where T.glow_txt is not null)
select *
from T
where t.glow_txt is not null;
First time this technique mentioned in http://orasql.org/2012/09/21/distinct-values-by-index-topn/
And we can use it in your example:
SELECT /*+ rows_with_min_id_for_every_glow_txt_minmax */
d.id, d.orderer, d.glow, d.ghigh
FROM (with T (glow_txt) as
(select
min (glow_txt) glow_txt from T_distinct
union all
select (select min (T1.glow_txt)
from T_distinct T1
where T1.glow_txt > T.glow_txt)
from T
where T.glow_txt is not null)
select *
from T
where t.glow_txt is not null) q
JOIN t_distinct d
ON d.id =
(
SELECT MIN(id)
FROM t_distinct di
WHERE di.glow_txt = q.glow_txt
)
;
Please note that all mentioned above works in 11gR2.
Yakov Vasylchenko
31 Oct 13 at 19:15