May 3, 2009 - EXPLAIN EXTENDED at EXPLAIN EXTENDED

How to create fast database queries

Archive for May 3rd, 2009

PostgreSQL: optimizing DISTINCT

In PostgreSQL (as of 8.3, at least), performance of DISTINCT clause in SELECT list is quite poor.

Probably because DISTINCT code in PostgreSQL is very, very old, it always acts in same dumb way: sorts the resultset and filters out the duplicate records.

GROUP BY that can be used for the same purpose is more smart, as it employs more efficient HashAggregate, but its performance is still poor for large dataset.

All major RDBMS's, including MySQL, are able to jump over index keys to select DISTINCT values from an indexed table. This is extremely fast if there are lots of records in a table but not so many DISTINCT values.

This behavior can be emulated in PostgreSQL too.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 3rd, 2009 at 11:00 pm

Posted in PostgreSQL

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

EXPLAIN EXTENDED

Archive for May 3rd, 2009

PostgreSQL: optimizing DISTINCT

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

EXPLAIN EXTENDED

Archive for May 3rd, 2009

PostgreSQL: optimizing DISTINCT

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow