EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘PostgreSQL’ Category

Happy New Year: composing music in SQL

with 5 comments

My previous New Year posts were about drawing pictures in SQL: snowflakes, clocks, fractals and even our planet Earth as seen from above.

But contrary to the popular belief, SQL is not just about graphics processing. You can use it for all kinds of things.

Today we'll use SQL to create music.

Remember those polyphonic ringtones in early 2000's cell phones? Instead of downloading MP3's (which were a pain to download and even more pain to upload them to the phone), you could type in the notes and their values, and the phone would play them for you.

Most phones played tunes in a very straightforward way: for each note, they calculated its frequency and generated a pure sine waveform of this frequency. Polyphonic phones could play several notes at once: this would require two or more superimposed waveforms. Math behind that are very simple: channel amplitude is a sine function of time and frequency, and all the channel amplitudes are added together.

Let's do this too and play some New Year music with PostgreSQL.

Read the rest of this entry »

Written by Quassnoi

December 31st, 2014 at 11:00 pm

Posted in PostgreSQL

Tagged with , , ,

Happy New Year: drawing fractals in SQL

with 5 comments

In one of my previous New Year's posts we drew snowflakes in PostgreSQL.

The algorithm we used to create the snowflakes is an implementation of an L-system, which is an example of a fractal. There are many more beautiful objects we can see in the winter: frozen trees, frost patterns on windows, cracks on ice etc., all of them being fractals.

Frost patterns

Today we will be constructing escape-time fractals. To build such a fractal, one would need to run a certain function for each point on the plane iteratively and see how many iterations does it take for the function to overflow: the more it takes, the brighter is the point's color.

I won't go deep into fractal theory now, just show that they can be constructed with SQL relatively easily. For instance, Mandelbrot set, one of the best known escape-time fractals, is almost a one-liner in PostgreSQL:

Read the rest of this entry »

Written by Quassnoi

December 31st, 2013 at 11:00 pm

PostgreSQL: parametrizing a recursive CTE

with 2 comments

Answering questions asked on the site.

Jan Suchal asks:

We've started playing with PostgreSQL and recursive queries. Looking at example that does basic graph traversal from http://www.postgresql.org/docs/9.0/static/queries-with.html.

We would like to have a view called paths defined like this:

WITH RECURSIVE
search_graph(id, path) AS
(
SELECT  id, ARRAY[id]
FROM    node
UNION ALL
SELECT  g.dest, sg.path || g.dest
FROM    search_graph sg
JOIN    graph g
ON      g.source = sg.id
AND NOT g.dest = ANY(sg.path)
)
SELECT  path
FROM    search_graph

By calling

SELECT  *
FROM    paths
WHERE   path[1] = :node_id

we would get all paths from a certain node.

The problem here is with performance. When you want this to be quick you need to add a condition for the anchor part of the UNION like this:

WITH RECURSIVE
search_graph(id, path) AS
(
SELECT  id, ARRAY[id]
FROM    node
WHERE   id = :node_id
UNION ALL
SELECT  g.dest, sg.path || g.dest
FROM    search_graph sg
JOIN    graph g
ON      g.source = sg.id
AND NOT g.dest = ANY(sg.path)
)
SELECT  path
FROM    search_graph

Now it's perfectly fast, but we cannot create a view because that would only contain paths from one specific node.

Any ideas?

An often overlooked feature of PostgreSQL is its ability to create set-returning functions and use them in SELECT list.

The record will be cross-joined with the set returned by the function and the result of the join will be added to the resultset.

This is best demonstrated with generate_series, probably a most used PostgreSQL set-returning function.
Read the rest of this entry »

Written by Quassnoi

December 24th, 2010 at 11:00 pm

Posted in PostgreSQL

Date ranges: overlapping with priority

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Jason Foster asks:

We have a table of student registrations:

Students
student_code course_code course_section session_cd
987654321 ESC102H1 Y 20085
998766543 ELEE203H F 20085

course_code and course_section identify a course, session_cd is an academic session, e. g. 20085, 20091, 20079.

The courses (stored in another table) have associated values for engineering design, complementary studies, etc., like that:

Courses
course_code course_section start_session end_session design science studies
ESC102H1 F 20071 20099 10 0 0
AER201Y1 Y 20059 NULL 0 0 30

, or like that:

In-house courses
course_code course_section student_code design science studies
ESC102H1 F 998766543 10 0 0

We are required by an external accreditation body to add up all of the units of engineering design, complementary studies, etc., taken by an individual student.

Where it gets really messy is that we have multiple data feeds for the associated values of courses. For example we have a set from the Registrar's Office, the Civil Department, our In-House version, etc.

The rule is that In-House beats Civil beats the Registrar's Office in the case of any duplication within the overlapping intervals.

The session_cd is of the form YYYY{1,5,9}.

Basically, we have three sets here.

To get the course hours for a given student we should find a record for him in the in-house set, or, failing that, find if the session is within the ranges of one of the external sets (Civil or Registrar). If both ranges contain the academic session the student took the course, Civil should be taken.

The first part is quite simple: we just LEFT JOIN students with the in-house courses and get the hours for the courses which are filled. The real problem is the next part: searching for the ranges containing a given value.

As I already mentioned in the previous posts, relational databases are not in general that efficient for the queries like that. It's easy to use an index to find a value of a column within a given range, but B-Tree indexes are of little help in searching for a range of two columns containing a given value.

However, in this case, the data domain of session_cd is quite a limited set. For each pair of session_start and session_end it is easy to create a set of all possible values between session_start and session_end.

The overlapping parts of the session ranges from the two sets will yields two records for each of the sessions belonging to the range. Of these two records we will need to take the relevant one (that is Civil) by using DISTINCT ON with the additional sorting on the source (Civil goes first).

Then we just join the relevant records to the subset of the students which does not have corresponding records in the in-house version.

Finally, we need to union this with the in-house recordset.
Read the rest of this entry »

Written by Quassnoi

April 7th, 2010 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: using recursive functions in nested sets

with one comment

In the previous article, I discussed a way to improve nested sets model in PostgreSQL.

The approach shown in the article used an analytical function to filter all immediate children of a node in a recursive CTE.

This allowed us to filter a node's children on the level more efficiently than R-Tree or B-Tree approaches do (since they rely on COUNT(*)).

That solution was pure SQL and it was quite fast, but not optimal.

The drawback of that solution is that it still needs to fetch all children of a node to apply the analytic function to them. This can take much time for the top of the hierarchy. And since the top of the hierarchy is what is what usually shown at the start page, it would be very nice to improve this query yet a little more.

We can do it by creating and using a simple recursive SQL function. This function does not even require PL/pgSQL to be enabled.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

March 2nd, 2010 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: nested sets and R-Tree

Comments enabled. I *really* need your comment

A feedback on one of my previous articles comparing adjacency list and nested sets models for PostgreSQL.

Jay writes:

In your series on adjacency lists vs nested sets, you discuss geometric types and R-Tree indexes in MySQL, but you don't discuss them when discussing the same subject with PostgreSQL, which also has geometric types and R-Tree indexing (mostly available through GiST indexes).

To make it simple I added the following line after the data insertion part of the script at Nested Sets - Postgresql:

ALTER TABLE t_hierarchy ADD COLUMN sets POLYGON;
UPDATE t_hierarchy SET sets = POLYGON(BOX(POINT(-1,lft), POINT(1, rgt)));

It needed to be a POLYGON instead of a BOX since there is a @>(POLYGON,POLYGON) function but no @>(BOX,BOX) function, and the polygon was cast from the box to create the rectangle shape required.

It outperforms the adjacency list on all descendants; outperforms it on all ancestors (not by much); performs reasonably well on all descendants up to a certain level on items with few descendants (e. g. 31415) and badly on items with many descendants (e. g. 42).

It still completes in less than 20 seconds though, which is an improvement over 1 minute.

PostgreSQL does support R-Tree indexes indeed (through GiST interface), and they can be used to improve the efficiency of the nested sets model.

Let's create a sample table and try some of the queries that Jay proposed:
Read the rest of this entry »

Written by Quassnoi

March 1st, 2010 at 11:00 pm

Posted in PostgreSQL

Six degrees of separation

with one comment

Answering questions asked on the site.

Kathy asks:

I am developing a social network site in PostgreSQL and want to find out if two people are no more than 6 friends apart.

If your site grows popular, most probably, they are not. But we better check.

On most social networks, friendship is a symmetric relationship (however, LiveJournal is a notable exception). This means that if Alice is a friend to Bob, then Bob is a friend to Alice as well.

The friendship relationship is best stored in a many-to-many link table with a PRIMARY KEY on both link fields and an additional check condition: the friend with the least id should be stored in the first column. This is to avoid storing a relationship twice: a PRIMARY KEY won't be violated if the same record with the columns swapped will be inserted, but the check constraint will. The check constraint will also forbid storing a friend relationship to itself.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

February 27th, 2010 at 11:00 pm

Posted in PostgreSQL

Sargability of monotonic functions: example

with 2 comments

In my previous article I presented a proposal to add sargability of monotonic functions into the SQL engines.

In a nutshell: a monotonic function is a function that preserves the order of the argument so that it gives the larger results for the larger values of the argument. It is easy to prove that a B-tree with each key replaced by the result of the function will remain the valid B-Tree and hence can be used to search for ranges of function results just like it is used to search for ranges of values.

With a little effort, a B-Tree can also be used to search for the ranges of piecewise monotonic functions: those whose domain can be split into a number of continuous pieces with the function being monotonic within each piece (but it may be not monotonic and even not continuous across the pieces).

In this article, I'll demonstrate the algorithm to do that (implemented in pure SQL on PostgreSQL).

I will show how the performance of simple query

SELECT  *
FROM    t_sine
WHERE   SIN(value) BETWEEN 0.1234 AND 0.1235

could be improved if the sargability of monotonic functions had been implemented in the optimizer.
Read the rest of this entry »

Written by Quassnoi

February 23rd, 2010 at 11:00 pm

Posted in PostgreSQL

Searching for arbitrary portions of a date

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a Ruby on Rails application with a PostgreSQL database; several tables have created_at and updated_at timestamp attributes.

When displayed, those dates are formatted in the user's locale; for example, the timestamp 2009-10-15 16:30:00.435 becomes the string 15.10.2009 - 16:30 (the date format for this example being dd.mm.yyyy - hh.mm).

The requirement is that the user must be able to search for records by date, as if they were strings formatted in the current locale.

For example, searching for 15.10.2009 would return records with dates on October 15th 2009; searching for 15.10 would return records with dates on October 15th of any year, searching for 15 would return all dates that match 15 (be it day, month or year).

The simplest solution would be just retrieve the locale string from the client, format the dates according to that string and search them using LIKE or ~ operators (the latter, as we all know, searches for POSIX regular expressions).

However, this would be not very efficient.

Let's create a sample table and see:
Read the rest of this entry »

Written by Quassnoi

February 2nd, 2010 at 11:00 pm

Posted in PostgreSQL

PostgreSQL: Selecting records holding group-wise maximum

Comments enabled. I *really* need your comment

Continuing the series on selecting records holding group-wise maximums:

How do I select the whole records, grouped on grouper and holding a group-wise maximum (or minimum) on other column?

In this article, I'll describe several ways to do this in PostgreSQL 8.4.

PostgreSQL 8.4 syntax is much richer than that of MySQL. The former can use the analytic functions, recursive CTE's and proprietary syntax extensions, all of which can be used for this task.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

November 26th, 2009 at 11:00 pm

Posted in PostgreSQL