EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘PostgreSQL’ Category

PostgreSQL: emulating ROW_NUMBER

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and below.

PostgreSQL 8.4 introduces window functions.

Window function ROW_NUMBER() implements the functionality in question more efficiently.

In one of the previous articles:

, I described emulating Oracle's pseudocolumn ROWNUM in PostgreSQL.

Now, we'll extend this query to emulate ROW_NUMBER.

A quick reminder: ROW_NUMBER is an analytical function in ANSI SQL 2003 supported by Oracle and MS SQL Server.

It enumerates each row in a resultset, but, unlike ROWNUM, may take two additional parameters: PARTITION BY and ORDER BY.

PARTITION BY splits a rowset into several partitions, each of them being numbered with its own sequence starting from 1.

ORDER BY defines the order the rows are numbered within each partition. This order may differ from the order the rows are returned in.

This function helps building queries which allow to select N rows for each partition.

Let's create a sample table and see how we do it in PostgreSQL:
Read the rest of this entry »

Written by Quassnoi

May 11th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: row numbers

with 2 comments

Note: this article concerns PostgreSQL 8.3 and below.

PostgreSQL 8.4 introduces window functions.

Window function ROW_NUMBER() implements the functionality in question more efficiently.

ROWNUM is a very useful pseudocolumn in Oracle that returns the position of each row in a final dataset.

Upcoming PostgreSQL 8.4 will have this pseudocolumn, but as for now will we need a hack to access it.

The main idea is simple:

  1. Wrap the query results into an array
  2. Join this array with a generate_series() so that numbers from 1 to array_upper() are returned
  3. For each row returned, return this number (as ROWNUM) along the corresponding array member (which is the row from the original query)

Let's create a table with multiple columns of different datatypes, write a complex query and try to assign the ROWNUM to the query results:
Read the rest of this entry »

Written by Quassnoi

May 5th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: optimizing DISTINCT

with 8 comments

In PostgreSQL (as of 8.3, at least), performance of DISTINCT clause in SELECT list is quite poor.

Probably because DISTINCT code in PostgreSQL is very, very old, it always acts in same dumb way: sorts the resultset and filters out the duplicate records.

GROUP BY that can be used for the same purpose is more smart, as it employs more efficient HashAggregate, but its performance is still poor for large dataset.

All major RDBMS's, including MySQL, are able to jump over index keys to select DISTINCT values from an indexed table. This is extremely fast if there are lots of records in a table but not so many DISTINCT values.

This behavior can be emulated in PostgreSQL too.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 3rd, 2009 at 11:00 pm

Posted in PostgreSQL

GROUP_CONCAT in PostgreSQL without aggregate functions

with one comment

In one of the previous articles:

Aggregate concatenation

, I described an aggregate function to concatenate strings in PostgreSQL, similar to GROUP_CONCAT in MySQL.

It's very useful if you have a complex GROUP BY query with multiple conditions.

But for some simple queries it's possible to emulate GROUP_CONCAT with pure SQL, avoiding custom functions at all.

Let's create a table to demonstrate our task:
Read the rest of this entry »

Written by Quassnoi

May 2nd, 2009 at 11:00 pm

Posted in PostgreSQL

Selecting N records for each group: PostgreSQL

Comments enabled. I *really* need your comment

In one of the previous articles:

Advanced row sampling

, I described how to select a certain number of records for each group in a MySQL table.

This is trivial in SQL Server and Oracle, since an analytic function ROW_NUMBER() can be used to do this.

Now, I'll describe how to do it in PostgreSQL.

We are assuming that the records from the table should be grouped by a field called grouper, and for each grouper, N first records should be selected, ordered first by ts, then by id.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

April 29th, 2009 at 11:00 pm

Posted in PostgreSQL

Counting missing rows: PostgreSQL

Comments enabled. I *really* need your comment

This is the 5th of 5 articles covering implementation of NOT IN predicate in several RDBMS'es:

Finally let's look how PostgreSQL copes with this predicate.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

April 22nd, 2009 at 11:00 pm

Posted in PostgreSQL

Aggregate concatenation

Comments enabled. I *really* need your comment

Aggregate concatenation functions help creating a concatenated list out of a recordset. Useful for reports, hierarchical trees, etc.

MySQL supplies GROUP_CONCAT for this purpose. SYS_CONNECT_BY PATH and FOR XML can be used in Oracle and MS SQL.

In PostgreSQL, we cannot use these tricks, but we can create our own aggregate function. And this function will also accept two more extremely useful parameters: DELIMITER and IS_DISTINCT.
Read the rest of this entry »

Written by Quassnoi

March 4th, 2009 at 9:00 pm

Posted in PostgreSQL