Archive for the ‘PostgreSQL’ Category
Happy New Year: composing music in SQL
My previous New Year posts were about drawing pictures in SQL: snowflakes, clocks, fractals and even our planet Earth as seen from above.
But contrary to the popular belief, SQL is not just about graphics processing. You can use it for all kinds of things.
Today we'll use SQL to create music.
Remember those polyphonic ringtones in early 2000's cell phones? Instead of downloading MP3's (which were a pain to download and even more pain to upload them to the phone), you could type in the notes and their values, and the phone would play them for you.
Most phones played tunes in a very straightforward way: for each note, they calculated its frequency and generated a pure sine waveform of this frequency. Polyphonic phones could play several notes at once: this would require two or more superimposed waveforms. Math behind that are very simple: channel amplitude is a sine function of time and frequency, and all the channel amplitudes are added together.
Let's do this too and play some New Year music with PostgreSQL.
Happy New Year: drawing fractals in SQL
In one of my previous New Year's posts we drew snowflakes in PostgreSQL.
The algorithm we used to create the snowflakes is an implementation of an L-system, which is an example of a fractal. There are many more beautiful objects we can see in the winter: frozen trees, frost patterns on windows, cracks on ice etc., all of them being fractals.
Today we will be constructing escape-time fractals. To build such a fractal, one would need to run a certain function for each point on the plane iteratively and see how many iterations does it take for the function to overflow: the more it takes, the brighter is the point's color.
I won't go deep into fractal theory now, just show that they can be constructed with SQL relatively easily. For instance, Mandelbrot set, one of the best known escape-time fractals, is almost a one-liner in PostgreSQL:
PostgreSQL: parametrizing a recursive CTE
Answering questions asked on the site.
Jan Suchal asks:
We've started playing with PostgreSQL and recursive queries. Looking at example that does basic graph traversal from http://www.postgresql.org/docs/9.0/static/queries-with.html.
We would like to have a view called
paths
defined like this:WITH RECURSIVE search_graph(id, path) AS ( SELECT id, ARRAY[id] FROM node UNION ALL SELECT g.dest, sg.path || g.dest FROM search_graph sg JOIN graph g ON g.source = sg.id AND NOT g.dest = ANY(sg.path) ) SELECT path FROM search_graphBy calling
SELECT * FROM paths WHERE path[1] = :node_idwe would get all paths from a certain node.
The problem here is with performance. When you want this to be quick you need to add a condition for the anchor part of the
UNION
like this:WITH RECURSIVE search_graph(id, path) AS ( SELECT id, ARRAY[id] FROM node WHERE id = :node_id UNION ALL SELECT g.dest, sg.path || g.dest FROM search_graph sg JOIN graph g ON g.source = sg.id AND NOT g.dest = ANY(sg.path) ) SELECT path FROM search_graphNow it's perfectly fast, but we cannot create a view because that would only contain paths from one specific node.
Any ideas?
An often overlooked feature of PostgreSQL is its ability to create set-returning functions and use them in SELECT
list.
The record will be cross-joined with the set returned by the function and the result of the join will be added to the resultset.
This is best demonstrated with generate_series
, probably a most used PostgreSQL set-returning function.
Read the rest of this entry »
Date ranges: overlapping with priority
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Jason Foster asks:
We have a table of student registrations:
Students student_code course_code course_section session_cd 987654321 ESC102H1 Y 20085 998766543 ELEE203H F 20085
course_code
andcourse_section
identify a course,session_cd
is an academic session, e. g. 20085, 20091, 20079.The courses (stored in another table) have associated values for
engineering design,complementary studies, etc., like that:
Courses course_code course_section start_session end_session design science studies ESC102H1 F 20071 20099 10 0 0 AER201Y1 Y 20059 NULL 0 0 30 , or like that:
In-house courses course_code course_section student_code design science studies ESC102H1 F 998766543 10 0 0 We are required by an external accreditation body to add up all of the units of
engineering design,complementary studies, etc., taken by an individual student.Where it gets really messy is that we have multiple data feeds for the associated values of courses. For example we have a set from the Registrar's Office, the Civil Department, our In-House version, etc.
The rule is that In-House beats Civil beats the Registrar's Office in the case of any duplication within the overlapping intervals.
The
session_cd
is of the formYYYY{1,5,9}
.
Basically, we have three sets here.
To get the course hours for a given student we should find a record for him in the in-house set, or, failing that, find if the session is within the ranges of one of the external sets (Civil or Registrar). If both ranges contain the academic session the student took the course, Civil should be taken.
The first part is quite simple: we just LEFT JOIN
students with the in-house courses and get the hours for the courses which are filled. The real problem is the next part: searching for the ranges containing a given value.
As I already mentioned in the previous posts, relational databases are not in general that efficient for the queries like that. It's easy to use an index to find a value of a column within a given range, but B-Tree indexes are of little help in searching for a range of two columns containing a given value.
However, in this case, the data domain of session_cd
is quite a limited set. For each pair of session_start
and session_end
it is easy to create a set of all possible values between session_start
and session_end
.
The overlapping parts of the session ranges from the two sets will yields two records for each of the sessions belonging to the range. Of these two records we will need to take the relevant one (that is Civil) by using DISTINCT ON
with the additional sorting on the source (Civil goes first).
Then we just join the relevant records to the subset of the students
which does not have corresponding records in the in-house version.
Finally, we need to union this with the in-house recordset.
Read the rest of this entry »
PostgreSQL: using recursive functions in nested sets
In the previous article, I discussed a way to improve nested sets model in PostgreSQL.
The approach shown in the article used an analytical function to filter all immediate children of a node in a recursive CTE.
This allowed us to filter a node's children on the level more efficiently than R-Tree or B-Tree approaches do (since they rely on COUNT(*)
).
That solution was pure SQL and it was quite fast, but not optimal.
The drawback of that solution is that it still needs to fetch all children of a node to apply the analytic function to them. This can take much time for the top of the hierarchy. And since the top of the hierarchy is what is what usually shown at the start page, it would be very nice to improve this query yet a little more.
We can do it by creating and using a simple recursive SQL function. This function does not even require PL/pgSQL to be enabled.
Let's create a sample table:
Read the rest of this entry »
PostgreSQL: nested sets and R-Tree
Comments enabled. I *really* need your comment
A feedback on one of my previous articles comparing adjacency list and nested sets models for PostgreSQL.
Jay writes:
In your series on adjacency lists vs nested sets, you discuss geometric types and R-Tree indexes in MySQL, but you don't discuss them when discussing the same subject with PostgreSQL, which also has geometric types and R-Tree indexing (mostly available through GiST indexes).
To make it simple I added the following line after the data insertion part of the script at Nested Sets - Postgresql:
ALTER TABLE t_hierarchy ADD COLUMN sets POLYGON; UPDATE t_hierarchy SET sets = POLYGON(BOX(POINT(-1,lft), POINT(1, rgt)));It needed to be a
POLYGON
instead of aBOX
since there is a@>(POLYGON,POLYGON)
function but no@>(BOX,BOX)
function, and the polygon was cast from the box to create the rectangle shape required.It outperforms the adjacency list on
all descendants; outperforms it onall ancestors(not by much); performs reasonably well onall descendants up to a certain levelon items with few descendants (e. g. 31415) and badly on items with many descendants (e. g. 42).It still completes in less than 20 seconds though, which is an improvement over 1 minute.
PostgreSQL does support R-Tree indexes indeed (through GiST interface), and they can be used to improve the efficiency of the nested sets model.
Let's create a sample table and try some of the queries that Jay proposed:
Read the rest of this entry »
Six degrees of separation
Answering questions asked on the site.
Kathy asks:
I am developing a social network site in PostgreSQL and want to find out if two people are no more than 6 friends apart.
If your site grows popular, most probably, they are not. But we better check.
On most social networks, friendship is a symmetric relationship (however, LiveJournal is a notable exception). This means that if Alice is a friend to Bob, then Bob is a friend to Alice as well.
The friendship relationship is best stored in a many-to-many link table with a PRIMARY KEY
on both link fields and an additional check condition: the friend with the least id should be stored in the first column. This is to avoid storing a relationship twice: a PRIMARY KEY
won't be violated if the same record with the columns swapped will be inserted, but the check constraint will. The check constraint will also forbid storing a friend relationship to itself.
Let's create a sample table:
Read the rest of this entry »
Sargability of monotonic functions: example
In my previous article I presented a proposal to add sargability of monotonic functions into the SQL engines.
In a nutshell: a monotonic function is a function that preserves the order of the argument so that it gives the larger results for the larger values of the argument. It is easy to prove that a B-tree with each key replaced by the result of the function will remain the valid B-Tree and hence can be used to search for ranges of function results just like it is used to search for ranges of values.
With a little effort, a B-Tree can also be used to search for the ranges of piecewise monotonic functions: those whose domain can be split into a number of continuous pieces with the function being monotonic within each piece (but it may be not monotonic and even not continuous across the pieces).
In this article, I'll demonstrate the algorithm to do that (implemented in pure SQL on PostgreSQL).
I will show how the performance of simple query
SELECT * FROM t_sine WHERE SIN(value) BETWEEN 0.1234 AND 0.1235
could be improved if the sargability of monotonic functions had been implemented in the optimizer.
Read the rest of this entry »
Searching for arbitrary portions of a date
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a Ruby on Rails application with a PostgreSQL database; several tables have
created_at
andupdated_at
timestamp attributes.When displayed, those dates are formatted in the user's locale; for example, the timestamp 2009-10-15 16:30:00.435 becomes the string 15.10.2009 - 16:30 (the date format for this example being
dd.mm.yyyy - hh.mm
).The requirement is that the user must be able to search for records by date, as if they were strings formatted in the current locale.
For example, searching for 15.10.2009 would return records with dates on October 15th 2009; searching for 15.10 would return records with dates on October 15th of any year, searching for 15 would return all dates that match 15 (be it day, month or year).
The simplest solution would be just retrieve the locale string from the client, format the dates according to that string and search them using LIKE
or ~
operators (the latter, as we all know, searches for POSIX regular expressions).
However, this would be not very efficient.
Let's create a sample table and see:
Read the rest of this entry »
PostgreSQL: Selecting records holding group-wise maximum
Comments enabled. I *really* need your comment
Continuing the series on selecting records holding group-wise maximums:
How do I select the whole records, grouped on
grouper
and holding a group-wise maximum (or minimum) on other column?
In this article, I'll describe several ways to do this in PostgreSQL 8.4.
PostgreSQL 8.4 syntax is much richer than that of MySQL. The former can use the analytic functions, recursive CTE's and proprietary syntax extensions, all of which can be used for this task.
Let's create a sample table:
Read the rest of this entry »