# EXPLAIN EXTENDED

How to create fast database queries

## Happy New Year: composing music in SQL

My previous New Year posts were about drawing pictures in SQL: snowflakes, clocks, fractals and even our planet Earth as seen from above.

But contrary to the popular belief, SQL is not just about graphics processing. You can use it for all kinds of things.

Today we'll use SQL to create music.

Remember those polyphonic ringtones in early 2000's cell phones? Instead of downloading MP3's (which were a pain to download and even more pain to upload them to the phone), you could type in the notes and their values, and the phone would play them for you.

Most phones played tunes in a very straightforward way: for each note, they calculated its frequency and generated a pure sine waveform of this frequency. Polyphonic phones could play several notes at once: this would require two or more superimposed waveforms. Math behind that are very simple: channel amplitude is a sine function of time and frequency, and all the channel amplitudes are added together.

Let's do this too and play some New Year music with PostgreSQL.

Written by Quassnoi

December 31st, 2014 at 11:00 pm

Posted in PostgreSQL

Tagged with , , ,

## Happy New Year: drawing fractals in SQL

In one of my previous New Year's posts we drew snowflakes in PostgreSQL.

The algorithm we used to create the snowflakes is an implementation of an L-system, which is an example of a fractal. There are many more beautiful objects we can see in the winter: frozen trees, frost patterns on windows, cracks on ice etc., all of them being fractals.

Today we will be constructing escape-time fractals. To build such a fractal, one would need to run a certain function for each point on the plane iteratively and see how many iterations does it take for the function to overflow: the more it takes, the brighter is the point's color.

I won't go deep into fractal theory now, just show that they can be constructed with SQL relatively easily. For instance, Mandelbrot set, one of the best known escape-time fractals, is almost a one-liner in PostgreSQL:

Written by Quassnoi

December 31st, 2013 at 11:00 pm

## PostgreSQL: parametrizing a recursive CTE

We've started playing with PostgreSQL and recursive queries. Looking at example that does basic graph traversal from http://www.postgresql.org/docs/9.0/static/queries-with.html.

We would like to have a view called `paths` defined like this:

```WITH RECURSIVE
search_graph(id, path) AS
(
SELECT  id, ARRAY[id]
FROM    node
UNION ALL
SELECT  g.dest, sg.path || g.dest
FROM    search_graph sg
JOIN    graph g
ON      g.source = sg.id
AND NOT g.dest = ANY(sg.path)
)
SELECT  path
FROM    search_graph
```

By calling

```SELECT  *
FROM    paths
WHERE   path[1] = :node_id
```

we would get all paths from a certain node.

The problem here is with performance. When you want this to be quick you need to add a condition for the anchor part of the `UNION` like this:

```WITH RECURSIVE
search_graph(id, path) AS
(
SELECT  id, ARRAY[id]
FROM    node
WHERE   id = :node_id
UNION ALL
SELECT  g.dest, sg.path || g.dest
FROM    search_graph sg
JOIN    graph g
ON      g.source = sg.id
AND NOT g.dest = ANY(sg.path)
)
SELECT  path
FROM    search_graph
```

Now it's perfectly fast, but we cannot create a view because that would only contain paths from one specific node.

Any ideas?

An often overlooked feature of PostgreSQL is its ability to create set-returning functions and use them in `SELECT` list.

The record will be cross-joined with the set returned by the function and the result of the join will be added to the resultset.

This is best demonstrated with `generate_series`, probably a most used PostgreSQL set-returning function.
Read the rest of this entry »

Written by Quassnoi

December 24th, 2010 at 11:00 pm

Posted in PostgreSQL

## Date ranges: overlapping with priority

We have a table of student registrations:

Students
student_code course_code course_section session_cd
987654321 ESC102H1 Y 20085
998766543 ELEE203H F 20085

`course_code` and `course_section` identify a course, `session_cd` is an academic session, e. g. 20085, 20091, 20079.

The courses (stored in another table) have associated values for engineering design, complementary studies, etc., like that:

Courses
course_code course_section start_session end_session design science studies
ESC102H1 F 20071 20099 10 0 0
AER201Y1 Y 20059 NULL 0 0 30

, or like that:

In-house courses
course_code course_section student_code design science studies
ESC102H1 F 998766543 10 0 0

We are required by an external accreditation body to add up all of the units of engineering design, complementary studies, etc., taken by an individual student.

Where it gets really messy is that we have multiple data feeds for the associated values of courses. For example we have a set from the Registrar's Office, the Civil Department, our In-House version, etc.

The rule is that In-House beats Civil beats the Registrar's Office in the case of any duplication within the overlapping intervals.

The `session_cd` is of the form `YYYY{1,5,9}`.

Basically, we have three sets here.

To get the course hours for a given student we should find a record for him in the in-house set, or, failing that, find if the session is within the ranges of one of the external sets (Civil or Registrar). If both ranges contain the academic session the student took the course, Civil should be taken.

The first part is quite simple: we just `LEFT JOIN` students with the in-house courses and get the hours for the courses which are filled. The real problem is the next part: searching for the ranges containing a given value.

As I already mentioned in the previous posts, relational databases are not in general that efficient for the queries like that. It's easy to use an index to find a value of a column within a given range, but B-Tree indexes are of little help in searching for a range of two columns containing a given value.

However, in this case, the data domain of `session_cd` is quite a limited set. For each pair of `session_start` and `session_end` it is easy to create a set of all possible values between `session_start` and `session_end`.

The overlapping parts of the session ranges from the two sets will yields two records for each of the sessions belonging to the range. Of these two records we will need to take the relevant one (that is Civil) by using `DISTINCT ON` with the additional sorting on the source (Civil goes first).

Then we just join the relevant records to the subset of the `students` which does not have corresponding records in the in-house version.

Finally, we need to union this with the in-house recordset.
Read the rest of this entry »

Written by Quassnoi

April 7th, 2010 at 11:00 pm

Posted in PostgreSQL

## PostgreSQL: using recursive functions in nested sets

with one comment

In the previous article, I discussed a way to improve nested sets model in PostgreSQL.

The approach shown in the article used an analytical function to filter all immediate children of a node in a recursive CTE.

This allowed us to filter a node's children on the level more efficiently than R-Tree or B-Tree approaches do (since they rely on `COUNT(*)`).

That solution was pure SQL and it was quite fast, but not optimal.

The drawback of that solution is that it still needs to fetch all children of a node to apply the analytic function to them. This can take much time for the top of the hierarchy. And since the top of the hierarchy is what is what usually shown at the start page, it would be very nice to improve this query yet a little more.

We can do it by creating and using a simple recursive SQL function. This function does not even require PL/pgSQL to be enabled.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

March 2nd, 2010 at 11:00 pm

Posted in PostgreSQL

## PostgreSQL: nested sets and R-Tree

A feedback on one of my previous articles comparing adjacency list and nested sets models for PostgreSQL.

Jay writes:

In your series on adjacency lists vs nested sets, you discuss geometric types and R-Tree indexes in MySQL, but you don't discuss them when discussing the same subject with PostgreSQL, which also has geometric types and R-Tree indexing (mostly available through GiST indexes).

To make it simple I added the following line after the data insertion part of the script at Nested Sets - Postgresql:

```ALTER TABLE t_hierarchy ADD COLUMN sets POLYGON;
UPDATE t_hierarchy SET sets = POLYGON(BOX(POINT(-1,lft), POINT(1, rgt)));
```

It needed to be a `POLYGON` instead of a `BOX` since there is a `@>(POLYGON,POLYGON)` function but no `@>(BOX,BOX)` function, and the polygon was cast from the box to create the rectangle shape required.

It outperforms the adjacency list on all descendants; outperforms it on all ancestors (not by much); performs reasonably well on all descendants up to a certain level on items with few descendants (e. g. 31415) and badly on items with many descendants (e. g. 42).

It still completes in less than 20 seconds though, which is an improvement over 1 minute.

PostgreSQL does support R-Tree indexes indeed (through GiST interface), and they can be used to improve the efficiency of the nested sets model.

Let's create a sample table and try some of the queries that Jay proposed:
Read the rest of this entry »

Written by Quassnoi

March 1st, 2010 at 11:00 pm

Posted in PostgreSQL

## Six degrees of separation

with one comment

I am developing a social network site in PostgreSQL and want to find out if two people are no more than 6 friends apart.

If your site grows popular, most probably, they are not. But we better check.

On most social networks, friendship is a symmetric relationship (however, LiveJournal is a notable exception). This means that if Alice is a friend to Bob, then Bob is a friend to Alice as well.

The friendship relationship is best stored in a many-to-many link table with a `PRIMARY KEY` on both link fields and an additional check condition: the friend with the least id should be stored in the first column. This is to avoid storing a relationship twice: a `PRIMARY KEY` won't be violated if the same record with the columns swapped will be inserted, but the check constraint will. The check constraint will also forbid storing a friend relationship to itself.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

February 27th, 2010 at 11:00 pm

Posted in PostgreSQL

## Sargability of monotonic functions: example

In my previous article I presented a proposal to add sargability of monotonic functions into the SQL engines.

In a nutshell: a monotonic function is a function that preserves the order of the argument so that it gives the larger results for the larger values of the argument. It is easy to prove that a B-tree with each key replaced by the result of the function will remain the valid B-Tree and hence can be used to search for ranges of function results just like it is used to search for ranges of values.

With a little effort, a B-Tree can also be used to search for the ranges of piecewise monotonic functions: those whose domain can be split into a number of continuous pieces with the function being monotonic within each piece (but it may be not monotonic and even not continuous across the pieces).

In this article, I'll demonstrate the algorithm to do that (implemented in pure SQL on PostgreSQL).

I will show how the performance of simple query

```SELECT  *
FROM    t_sine
WHERE   SIN(value) BETWEEN 0.1234 AND 0.1235
```

could be improved if the sargability of monotonic functions had been implemented in the optimizer.
Read the rest of this entry »

Written by Quassnoi

February 23rd, 2010 at 11:00 pm

Posted in PostgreSQL

## Searching for arbitrary portions of a date

From Stack Overflow:

I have a Ruby on Rails application with a PostgreSQL database; several tables have `created_at` and `updated_at` timestamp attributes.

When displayed, those dates are formatted in the user's locale; for example, the timestamp 2009-10-15 16:30:00.435 becomes the string 15.10.2009 - 16:30 (the date format for this example being `dd.mm.yyyy - hh.mm`).

The requirement is that the user must be able to search for records by date, as if they were strings formatted in the current locale.

For example, searching for 15.10.2009 would return records with dates on October 15th 2009; searching for 15.10 would return records with dates on October 15th of any year, searching for 15 would return all dates that match 15 (be it day, month or year).

The simplest solution would be just retrieve the locale string from the client, format the dates according to that string and search them using `LIKE` or `~` operators (the latter, as we all know, searches for POSIX regular expressions).

However, this would be not very efficient.

Let's create a sample table and see:
Read the rest of this entry »

Written by Quassnoi

February 2nd, 2010 at 11:00 pm

Posted in PostgreSQL

## PostgreSQL: Selecting records holding group-wise maximum

Continuing the series on selecting records holding group-wise maximums:

How do I select the whole records, grouped on `grouper` and holding a group-wise maximum (or minimum) on other column?

In this article, I'll describe several ways to do this in PostgreSQL 8.4.

PostgreSQL 8.4 syntax is much richer than that of MySQL. The former can use the analytic functions, recursive CTE's and proprietary syntax extensions, all of which can be used for this task.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

November 26th, 2009 at 11:00 pm

Posted in PostgreSQL