Archive for the ‘PostgreSQL’ Category
PostgreSQL: emulating ROW_NUMBER
Comments enabled. I *really* need your comment
Note: this article concerns PostgreSQL 8.3 and below.
PostgreSQL 8.4 introduces window functions.
Window function ROW_NUMBER()
implements the functionality in question more efficiently.
In one of the previous articles:
, I described emulating Oracle's pseudocolumn ROWNUM
in PostgreSQL.
Now, we'll extend this query to emulate ROW_NUMBER
.
A quick reminder: ROW_NUMBER
is an analytical function in ANSI SQL 2003
supported by Oracle and MS SQL Server.
It enumerates each row in a resultset, but, unlike ROWNUM
, may take two additional parameters: PARTITION BY
and ORDER BY
.
PARTITION BY
splits a rowset into several partitions, each of them being numbered with its own sequence starting from 1.
ORDER BY
defines the order the rows are numbered within each partition. This order may differ from the order the rows are returned in.
This function helps building queries which allow to select N
rows for each partition.
Let's create a sample table and see how we do it in PostgreSQL:
Read the rest of this entry »
PostgreSQL: row numbers
Note: this article concerns PostgreSQL 8.3 and below.
PostgreSQL 8.4 introduces window functions.
Window function ROW_NUMBER()
implements the functionality in question more efficiently.
ROWNUM
is a very useful pseudocolumn in Oracle that returns the position of each row in a final dataset.
Upcoming PostgreSQL 8.4 will have this pseudocolumn, but as for now will we need a hack to access it.
The main idea is simple:
- Wrap the query results into an array
- Join this array with a
generate_series()
so that numbers from 1 toarray_upper()
are returned - For each row returned, return this number (as
ROWNUM
) along the corresponding array member (which is the row from the original query)
Let's create a table with multiple columns of different datatypes, write a complex query and try to assign the ROWNUM
to the query results:
Read the rest of this entry »
PostgreSQL: optimizing DISTINCT
In PostgreSQL (as of 8.3, at least), performance of DISTINCT
clause in SELECT
list is quite poor.
Probably because DISTINCT
code in PostgreSQL is very, very old, it always acts in same dumb way: sorts the resultset and filters out the duplicate records.
GROUP BY
that can be used for the same purpose is more smart, as it employs more efficient HashAggregate
, but its performance is still poor for large dataset.
All major RDBMS
's, including MySQL, are able to jump over index keys to select DISTINCT
values from an indexed table. This is extremely fast if there are lots of records in a table but not so many DISTINCT
values.
This behavior can be emulated in PostgreSQL too.
Let's create a sample table:
Read the rest of this entry »
GROUP_CONCAT in PostgreSQL without aggregate functions
In one of the previous articles:
, I described an aggregate function to concatenate strings in PostgreSQL, similar to GROUP_CONCAT
in MySQL
.
It's very useful if you have a complex GROUP BY
query with multiple conditions.
But for some simple queries it's possible to emulate GROUP_CONCAT
with pure SQL
, avoiding custom functions at all.
Let's create a table to demonstrate our task:
Read the rest of this entry »
Selecting N records for each group: PostgreSQL
Comments enabled. I *really* need your comment
In one of the previous articles:
, I described how to select a certain number of records for each group in a MySQL table.
This is trivial in SQL Server and Oracle, since an analytic function ROW_NUMBER()
can be used to do this.
Now, I'll describe how to do it in PostgreSQL.
We are assuming that the records from the table should be grouped by a field called grouper
, and for each grouper
, N
first records should be selected, ordered first by ts
, then by id
.
Let's create a sample table:
Read the rest of this entry »
Counting missing rows: PostgreSQL
Comments enabled. I *really* need your comment
This is the 5th of 5 articles covering implementation of NOT IN
predicate in several RDBMS'es:
NOT IN
in MySQLNOT IN
in MySQL (usingDISTINCT
)NOT IN
in SQL ServerNOT IN
in OracleNOT IN
in PostgreSQL
Finally let's look how PostgreSQL
copes with this predicate.
Let's create sample tables:
Read the rest of this entry »
Aggregate concatenation
Comments enabled. I *really* need your comment
Aggregate concatenation functions help creating a concatenated list out of a recordset. Useful for reports, hierarchical trees, etc.
MySQL supplies GROUP_CONCAT
for this purpose. SYS_CONNECT_BY PATH
and FOR XML
can be used in Oracle and MS SQL.
In PostgreSQL, we cannot use these tricks, but we can create our own aggregate function. And this function will also accept two more extremely useful parameters: DELIMITER
and IS_DISTINCT
.
Read the rest of this entry »