EXPLAIN EXTENDED

How to create fast database queries

Archive for May 3rd, 2009

PostgreSQL: optimizing DISTINCT

with 8 comments

In PostgreSQL (as of 8.3, at least), performance of DISTINCT clause in SELECT list is quite poor.

Probably because DISTINCT code in PostgreSQL is very, very old, it always acts in same dumb way: sorts the resultset and filters out the duplicate records.

GROUP BY that can be used for the same purpose is more smart, as it employs more efficient HashAggregate, but its performance is still poor for large dataset.

All major RDBMS's, including MySQL, are able to jump over index keys to select DISTINCT values from an indexed table. This is extremely fast if there are lots of records in a table but not so many DISTINCT values.

This behavior can be emulated in PostgreSQL too.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 3rd, 2009 at 11:00 pm

Posted in PostgreSQL