EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘PostgreSQL’ Category

Longest common prefix: PostgreSQL

Comments enabled. I *really* need your comment

This is a series of articles on how to strip all strings in a set of their longest common prefix and concatenate the results:

Today, I'll show how to do it in PostgreSQL.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

Like with SQL Server, it is possible to do this in PostgreSQL using a single SQL query.

But since PostgreSQL offers a nice ability to create custom aggregates, I'll better demonstrate how to solve this task using ones.

In my opinion, custom aggregates fit here just perfectly.

Since PostgreSQL lacks native support for aggregate concatenation, we will need to create two custom aggregates here:
Read the rest of this entry »

Written by Quassnoi

June 5th, 2009 at 11:00 pm

Posted in PostgreSQL

Aggregate AND

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:

CREATE TABLE myTable
(        someKey integer,
        someBool boolean
);
INSERT
INTO    myTable
VALUES
        (1, 't'),
        (1, 't'),
        (2, 'f'),
        (2, 't');

Each someKey could have 0 or more entries.

For any given someKey, I need to know if

  1. All the entries are true, or
  2. Any of the entries are false

Basically, it's an AND.

This solution is often used to represent polls that should be unanimous for the decision to be made (i. e. anyone can put a veto on the decision).

PostgreSQL offers a special aggregate BOOL_AND to do this.

However, an aggregate may be less efficient here.

The return value of an AND function is constrained by finding certain values:

  1. Whenever a FALSE is found, the return value cannot be TRUE anymore. It's either FALSE or NULL.
  2. Whenever a NULL is found, the return value is NULL

Aggregates in PostgreSQL, however, won't take this into account.

What we need here is a method to stop and return whenever first NULL or FALSE value is found.

Let's create a sample table and see how can it may be done:
Read the rest of this entry »

Written by Quassnoi

May 30th, 2009 at 11:00 pm

Posted in PostgreSQL

Hierarchical queries in PostgreSQL

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and earlier.

For hierarchical queries in PostgreSQL 8.4 and higher, see this article:

In one of the previous articles I wrote about using hierarchical queries in MySQL:

PostgreSQL has a contrib module to implement the same functionality.

However, it's not always possible to install and use contribs. Same is true for procedural languages.

Fortunately, this functionality can be implemented using a plain SQL function.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 29th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: selecting a function

Comments enabled. I *really* need your comment

From Stack Overflow:

Hello,

I want to write a SELECT statement as follows:

SELECT  field_a
FROM    my_table
WHERE   field_b IN (my_function(field_c))

Is that possible?

Would my_function have to return an array?

It is of course possible in PostgreSQL using a set returning function.

Return type of such a function should be declared as SETOF, so the function will return a rowset of given type.

There are at least two ways to call this function, and in this article I will consider the benefits and drawbacks of each method.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

May 14th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: emulating ROW_NUMBER

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and below.

PostgreSQL 8.4 introduces window functions.

Window function ROW_NUMBER() implements the functionality in question more efficiently.

In one of the previous articles:

, I described emulating Oracle's pseudocolumn ROWNUM in PostgreSQL.

Now, we'll extend this query to emulate ROW_NUMBER.

A quick reminder: ROW_NUMBER is an analytical function in ANSI SQL 2003 supported by Oracle and MS SQL Server.

It enumerates each row in a resultset, but, unlike ROWNUM, may take two additional parameters: PARTITION BY and ORDER BY.

PARTITION BY splits a rowset into several partitions, each of them being numbered with its own sequence starting from 1.

ORDER BY defines the order the rows are numbered within each partition. This order may differ from the order the rows are returned in.

This function helps building queries which allow to select N rows for each partition.

Let's create a sample table and see how we do it in PostgreSQL:
Read the rest of this entry »

Written by Quassnoi

May 11th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: row numbers

with 2 comments

Note: this article concerns PostgreSQL 8.3 and below.

PostgreSQL 8.4 introduces window functions.

Window function ROW_NUMBER() implements the functionality in question more efficiently.

ROWNUM is a very useful pseudocolumn in Oracle that returns the position of each row in a final dataset.

Upcoming PostgreSQL 8.4 will have this pseudocolumn, but as for now will we need a hack to access it.

The main idea is simple:

  1. Wrap the query results into an array
  2. Join this array with a generate_series() so that numbers from 1 to array_upper() are returned
  3. For each row returned, return this number (as ROWNUM) along the corresponding array member (which is the row from the original query)

Let's create a table with multiple columns of different datatypes, write a complex query and try to assign the ROWNUM to the query results:
Read the rest of this entry »

Written by Quassnoi

May 5th, 2009 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: optimizing DISTINCT

with 8 comments

In PostgreSQL (as of 8.3, at least), performance of DISTINCT clause in SELECT list is quite poor.

Probably because DISTINCT code in PostgreSQL is very, very old, it always acts in same dumb way: sorts the resultset and filters out the duplicate records.

GROUP BY that can be used for the same purpose is more smart, as it employs more efficient HashAggregate, but its performance is still poor for large dataset.

All major RDBMS's, including MySQL, are able to jump over index keys to select DISTINCT values from an indexed table. This is extremely fast if there are lots of records in a table but not so many DISTINCT values.

This behavior can be emulated in PostgreSQL too.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 3rd, 2009 at 11:00 pm

Posted in PostgreSQL

GROUP_CONCAT in PostgreSQL without aggregate functions

with one comment

In one of the previous articles:

Aggregate concatenation

, I described an aggregate function to concatenate strings in PostgreSQL, similar to GROUP_CONCAT in MySQL.

It's very useful if you have a complex GROUP BY query with multiple conditions.

But for some simple queries it's possible to emulate GROUP_CONCAT with pure SQL, avoiding custom functions at all.

Let's create a table to demonstrate our task:
Read the rest of this entry »

Written by Quassnoi

May 2nd, 2009 at 11:00 pm

Posted in PostgreSQL

Selecting N records for each group: PostgreSQL

Comments enabled. I *really* need your comment

In one of the previous articles:

Advanced row sampling

, I described how to select a certain number of records for each group in a MySQL table.

This is trivial in SQL Server and Oracle, since an analytic function ROW_NUMBER() can be used to do this.

Now, I'll describe how to do it in PostgreSQL.

We are assuming that the records from the table should be grouped by a field called grouper, and for each grouper, N first records should be selected, ordered first by ts, then by id.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

April 29th, 2009 at 11:00 pm

Posted in PostgreSQL

Counting missing rows: PostgreSQL

Comments enabled. I *really* need your comment

This is the 5th of 5 articles covering implementation of NOT IN predicate in several RDBMS'es:

Finally let's look how PostgreSQL copes with this predicate.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

April 22nd, 2009 at 11:00 pm

Posted in PostgreSQL