From Stack Overflow:
I have this query:SELECT DISTINCT id, name FROM table1
For a given
namewill always be the same.
Both fields are indexed. There's no separate table that maps the id to the name.
The table is very large (10,000,000 rows), so the query could take some time.
This query is very fast, since it's indexed:SELECT DISTINCT id FROM table1
Likewise for this query:SELECT DISTINCT id FROM table1
Assuming I can't get the database structure changed (a very safe assumption), what's a better way to structure the first query for performance?
The fastest way would of course be building an index on
But since we (for whatever reason) cannot change the database structure, it's not an option.
Author, nevertheless, mentioned that for any given
name will always be the same. This can help us.
Let's create a sample table:
CREATE TABLE t_distinct ( id NUMBER(10) NOT NULL, name VARCHAR2(50) NOT NULL, stuffing VARCHAR2(400) NOT NULL ) / INSERT INTO t_distinct (id, name, stuffing) SELECT FLOOR((level - 1) / 30000) + 1, 'Name ' || (FLOOR((level - 1) / 30000) + 1), LPAD(' ', 400, ' ') FROM dual CONNECT BY level <= 1000000 / COMMIT / CREATE INDEX ix_distinct_id ON t_distinct (id) / CREATE INDEX ix_distinct_name ON t_distinct (name) / BEGIN DBMS_STATS.gather_schema_stats('"20090602_distinct"'); END; /
This table has 1,000,000 rows and 34
DISTINCT id, name pairs.
The table also has a stuffing column filled with spaces.
Author mentioned that there are lots of other columns in his table, and this column makes our table to occupy about as much space as his table does.
The original query:
SELECT DISTINCT id, name FROM t_distinct
|34 rows fetched in 0.0009s (2.5312s)|
SELECT STATEMENT HASH UNIQUE VIEW , 20090602_distinct.index$_join$_001 HASH JOIN INDEX FAST FULL SCAN, 20090602_distinct.IX_DISTINCT_ID INDEX FAST FULL SCAN, 20090602_distinct.IX_DISTINCT_NAME
runs for more than
This is because two indexes are being
ROWID to obtain both values. This is quite a long operation.
What can we do to improve it?
Author mentioned that for any given
id, there is exactly one
This means that given an
id we can select any
name we want, and there is no need to browse all table records.
name for an
id (in no particular order) can be selected with a subquery:
SELECT name FROM t_distinct WHERE id = :id AND rownum = 1
This will use an index on
Now everything we need to do is to select
DISTINCT id's (which is efficient) and use this subquery for each of these
SELECT did, ( SELECT name FROM t_distinct di WHERE id = did AND rownum = 1 ) AS name FROM ( SELECT /*+ NO_MERGE */ DISTINCT id AS did FROM t_distinct d )
|34 rows fetched in 0.0009s (0.2204s)|
SELECT STATEMENT COUNT STOPKEY TABLE ACCESS BY INDEX ROWID, 20090602_distinct.T_DISTINCT INDEX RANGE SCAN, 20090602_distinct.IX_DISTINCT_ID VIEW HASH UNIQUE INDEX FAST FULL SCAN, 20090602_distinct.IX_DISTINCT_ID
This is much faster, since everything can be done using the index on
The subquery is executed once per
DISTINCT id. Since the subquery requires only one index scan and there are few
DISTINCT id's, the subqueries conribute very little into overall query time.