EXPLAIN EXTENDED

How to create fast database queries

Happy New Year!

with 16 comments

This winter is anomalously warm in Europe, there is no snow and no New Year mood. So today we will be drawing a snowflake in PostgreSQL.

#1. A little theory

Core of a snowflake is six large symmetrical ice crystals growing from the common center. Out of these larger crystals other, smaller, crystals grow.

The overall shape of the snowflake is defined by how large do crystals grow and where exactly are they attached to each other.

These things are defined by fluctuations in air and temperature conditions around the snowflake. Because the flake itself is very small, in any given moment the conditions are nearly identical around each crystal, that's why the offspring crystals pop up in almost same places and grow to almost same lengths. Different flakes, though, constantly move to and from each other and are subject to very different fluctuations, and that's why they grow so unique.

Except for the root crystals (of which there are six), the child icicles grow in symmetrical pairs. More than that, each branch grows their own children (also in pairs), so on each step there are twice as many crystals, but they all share almost same length and angle. This gives the snowflake its symmetrical look.

So we can easily see that, despite the fact there may be many child crystals, the shape of a snowflake is defined by a relatively small number of parameters: how many children each crystal produces, where are they attached to it, at which angle they grow and to which length.

Now, let's try to model it.

#2. Defining parameters

To begin with, we will assume the length of the initial larger crystal as 1, and its angle as 0. Later we may easily scale and rotate the crystal.

There can be any number of child crystals (or, rather, pairs of child crystals), each pair attaching to its parent in a random place and growing at random angle. However, this places and angles, though random, are shared across the snowflake: if one pair grows here and at this angle, the similar pairs will grow on the same place and at the same angle on their parents.

This means that there are 6 * (2 ^ level) twins of each crystal, where level is the number of its ancestors. There are 6 root crystals, 12 first-level crystals in the first first-level pairs (each growing in the same place at the same angle to the same length on root crystals), 12 first-level crystals in the second first-level pairs (again, sharing all parameters with themselves but not with the first pairs), 24 first-first grandchildren, 24 first-second grandchildren etc.

So, to define the snowflake shape, we need to define the number of pairs on each step, and for each pair define its place on the parent crystal, angle and length.

These parameters are random but have some constraints. Child crystals are shorter than their parents; they grow at sharp angles, and the number of pairs is limited. Since the parameters of each crystal depend on its parent, we would need a recursive query for that:

SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level <= 3
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes
FROM    params;

id cut len alpha level spikes
{0} 0 1 0 1 2
{0,0} 0.356672627385706 0.285511903231964 0,895624824686616 2 2
{0,1} 0.561545127537102 0.170363144949079 0.618425840960865 2 3
{0,2} 0.475528724957258 0.394589512841776 0.81481895943341 2 0
{0,0,0} 0.667989769019186 0.0429738523301687 0.818223701208857 3 1
{0,0,1} 0.325173982884735 0.00710072719083308 0.517746497950111 3 0
{0,0,2} 0.181089712772518 0.0441518332901016 0.789388660278232 3 0
{0,1,0} 0.272290788125247 0.0784957594457395 0.522800933603073 3 0
{0,1,1} 0.559544350020587 0.0571536974970293 0.713263419194096 3 3
{0,1,2} 0.642731287050992 0.0842548574598808 0.782757860467376 3 1
{0,1,3} 0.0450216801837087 0.0163872951631622 0.590039109739568 3 0
{0,2,0} 0.289853980299085 0.173285180190047 0.896723437835692 3 1
12 rows fetched in 0.0005s (0.0047s)

Here, id is an array defining the path to the crystal pairs: {0} is the root branch, {0, 0} is the first child pair, {0, 1, 0} is the first child of the second child etc.

cut defines the position where the crystals are attached to the parent: 0.356672627385706 in the second row means the first child will grow at 35.6% of the root branches' length.

len is the child length (in respect to the root branch, not immediate parent), angle is the angle at which they grow (in respect to the immediate parent), and spikes is the number of children pairs for the given crystal. level, I hope, is self-explanatory.

#3. Defining crystal coordinates

Now that we have the shape of our flake defined we need to build the actual coordinates for each crystal. To do this we again would need a recursive query.

First, we should make a set of branches on each level. We would start from a single root branch (it will be easy to clone it later) and generate a set of branches on each level. Each record in this set would correspond to an actual snowflake crystal:

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level <= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 << (level - 1)) - 1) AS branch
        FROM    params
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes, branch
FROM    tree;

View query output

This is just a copy of the parameters recordset with each parameter duplicated as many times as the crystal level requires. There is one root branch, two copies of level 1 branches, four copies of level 2 branches etc. Each instance is defined by the parameter branch.

To calculate coordinates of each crystal we need to traverse to it from the top.

The coordinates of the root branch are known: they are (0, 0), (1, 0) (by definition).

To build the first pair (two crystals, each 28.5% long, growing at 35.6% from the beginning of the root branch at angles 51° 18' and −51° 18', respectively) we would need to take the coordinates of the parent, find the start point (it would be (Xs + (Xe - Xs) * cut, Ys + (Ye - Ys) * cut) and then the coordinates of the end point (by adding len * COS(α) and len * SIN(α) to X and Y of the start point, respectively). α in this formula is in respect to the coordinate grid, not to the parent, and to find it we just need to sum all angles from the ancestors.

This is best done with another recursive query. On each step we should find immediate children of the current parent.

This can be easily achieved using the following join condition: child.id > parent.id AND child.id <= parend.id || spikes AND ARRAY_LENGTH(child.level, 1) = ARRAY_LENGTH(parent.level, 1) + 1. This condition employs PostgreSQL array arithmetics: if we have parent.id = {0, 2, 1} with 3 children (spikes = 2), then the condition would return {0, 2, 1, 0}, {0, 2, 1, 1} and {0, 2, 1, 2}. This hierarchy model is called materialized path, and some day I will write a post about it.

Since the id alone defines the parameters, not the instance, we need to add once more condition to find the instances. Crystals 0 and 1 would have child pairs 0, 1 and 2, 3, respectively, so we'll include child.branch BETWEEN p.branch * 2 AND child.branch * 2 + 1 into the join condition.

One more thing to do is to find whether we should add positive or negative angle. It's simple: even branches are negative, odd ones are positive.

And here's the query:

SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level <= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 << (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs, 
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id > p.id
                        AND t.id <= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        )
SELECT  id::TEXT, xs, ys, xe, ye, alpha, branch, level
FROM    points

View query output

Now, we have coordinates for each crystal.

#4. Visualizing

The most tedious part about SQL graphics is visualizing them. To do this, we'll employ PostgreSQL's geometrical functions.

Each crystal can be represented as a path between its start and end points. This can be used by constructing a line segment (LSEG(POINT, POINT)) using two point constructors (POINT(DOUBLE PRECISION, DOUBLE PRECISION)) and converting it to a path. Unfortunately, PostgreSQL does not allow direct lseg to path conversion but the latter can be easily constructed from the TEXT representation of an lseg.

We have six root branches so each crystal should be cloned to make six copies. It's easiest to make it by using PATH * POINT operator: it rotates and scales the PATH around (0, 0) so that (1, 0) becomes POINT. To construct the points, we will generate six rotation angles with step of 60° and will multiply the path by POINT(COS(α), SIN(α)). These multiplications will preserve lengths.

Finally we need to actually display the snowflake on the screen. To do this, we will generate a set of 80 × 80 records(x and y), defining the grid from (-1, -1) to (1, 1) with step of 1/40 units. Then we'll see if there is at least one crystal within distance of 1/40 units from each cell on the grid (using POINT <-> PATH distance operator and EXISTS). If there is, we will return a number sign (#) for this cell, otherwise a space.

Then we'll group the cells by lines (y), concatenate columns (using ARRAY_AGG and ARRAY_TO_STRING) and output the lines.

#5. The snowflake

And here's our snowflake:

SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level <= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 << (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs, 
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id > p.id
                        AND t.id <= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) <-> line <= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                '#'
                        ELSE    ' '
                        END
                        ORDER BY
                                x
                        ),
                ''
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y

array_to_string
# #
## ##
# #
## # # ##
# # # #
## ## ## ##
## # # ##
# # # #
## ## ## ## ## ##
# ######## ######## #
### #### # # #### ###
## ###### # # # # ###### ##
########### ### ### ###########
################ ################
####### ###### ###### #######
#### ######### ## ### ### ## ######### ####
###### ## ## # ## ## # ## ## ######
####### ### # # ### #######
####### ## ## #######
### ## ### ### ## ###
#### ##### ##### ####
###### # ### ### # ######
## ########## ########## ##
# # ##### ##### # #
## ## ## ##
## # ## ## # ##
## ## ## ## ## ##
## #### ## ## #### ##
## #### # # #### ##
#### #### ## ## #### ####
## ### ## # # ## ### ##
#### ## ## ## ## ## ## ####
######### ### ## ## ### #########
####### ## ### # # ### ## #######
##### # ## ### ## # #####
################################################################################
##### # ## ### ## # #####
####### ## ### # # ### ## #######
######### ### ## ## ### #########
#### ## ## ## ## ## ## ####
## ### ## # # ## ### ##
#### #### ## ## #### ####
## #### # # #### ##
## #### ## ## #### ##
## ## ## ## ## ##
## # ## ## # ##
## ## ## ##
# # ##### ##### # #
## ########## ########## ##
###### # ### ### # ######
#### ##### ##### ####
### ## ### ### ## ###
####### ## ## #######
####### ### # # ### #######
###### ## ## # ## ## # ## ## ######
#### ######### ## ### ### ## ######### ####
####### ###### ###### #######
################ ################
########### ### ### ###########
## ###### # # # # ###### ##
### #### # # #### ###
# ######## ######## #
## ## ## ## ## ##
# # # #
## # # ##
## ## ## ##
# # # #
## # # ##
# #
## ##
# #
80 rows fetched in 0.0010s (2.7344s)

#6. Some more unique snowflakes

View query

array_to_string
#### ####
###### ######
#### ####
## ##
# #
# # # #
## ## ## ##
# ## ## #
# ## # # ## #
## # ## ## # ##
## ## ## ## ## ## ## ##
###### ## # # ## ######
######## # # # # ########
## ###### ### ### ###### ##
# ###### ### ###### #
## ### ##
# # #
## ### ##
## # ### # ##
# ## ##### ## #
## ### ### ### ##
# ##### ##### #
## #### #### ##
# ##### ##### #
## ########### ##
#### ## ########### ## ####
# #### # #### #### # #### #
# ### #### ## #### #### ## #### ### #
# ####### #### # ## # # ## # #### ####### #
# # ######## ## ## # ### # ## ## ######## #
### ## ####### ### ### ### ####### ## ##
### ### ############# # ############# ### ##
## ## ###### ############### ###### ## ##
## #### ##### ## ## # ## ## # ## ## ##### #### ##
## ######### # ############# # ######### ##
## #### ######### ########### ######### #### ##
## ####### ######## ######## ####### ##
## #### ####### ####### #### ##
## ######## # # ######## ##
## ## ### ### ### ## ##
################################################################################
## ## ### ### ### ## ##
## ######## # # ######## ##
## #### ####### ####### #### ##
## ####### ######## ######## ####### ##
## #### ######### ########### ######### #### ##
## ######### # ############# # ######### ##
## #### ##### ## ## # ## ## # ## ## ##### #### ##
## ## ###### ############### ###### ## ##
### ### ############# # ############# ### ##
### ## ####### ### ### ### ####### ## ##
# # ######## ## ## # ### # ## ## ######## #
# ####### #### # ## # # ## # #### ####### #
# ### #### ## #### #### ## #### ### #
# #### # #### #### # #### #
#### ## ########### ## ####
## ########### ##
# ##### ##### #
## #### #### ##
# ##### ##### #
## ### ### ### ##
# ## ##### ## #
## # ### # ##
## ### ##
# # #
## ### ##
# ###### ### ###### #
## ###### ### ### ###### ##
######## # # # # ########
###### ## # # ## ######
## ## ## ## ## ## ## ##
## # ## ## # ##
# ## # # ## #
# ## ## #
## ## ## ##
# # # #
# #
## ##
#### ####
###### ######
80 rows fetched in 0.0010s (2.0781s)

View query

array_to_string
## ## ## ## ## ##
##### ## ## #####
#### # # ####
######## # # ########
##### ## ## # # ## ## #####
# ### # # # # ### #
## ### ### ##
##### #####
### ###
# #
## ##
## ##
# #
## # # # # ##
# # ### # # # ### # #
## ###### # ### # ###### ##
# ### # #### #### # ### #
## ## #### #### ## ##
## ## ### ## ## ### ## ##
### # # ## ## # # ###
##### ## # ## ## # ## #####
###### # ## # # ## # ######
# ### #### # # #### ### #
## # ####### # # ####### # ##
##### ##### #### #### ##### #####
#### ######## ######## ####
####### ## # ## ## # ## #######
### ##### #### #### ##### ###
# ### ### ### ### #
# ######## ######## #
## # ## ### ### ## # ##
#### ##### ##### ####
### ##### ##### ###
######## ##### ##### ########
### # ## ## #### #### ## ## # ###
### ## # # # # ## ###
##### #### ## ## #### #####
###### ## ## ## ## ######
## ######### ## # # ## ######### #
### ### ### #### ### #### ### ### ###
################################################################################
### ### ### #### ### #### ### ### ###
## ######### ## # # ## ######### #
###### ## ## ## ## ######
##### #### ## ## #### #####
### ## # # # # ## ###
### # ## ## #### #### ## ## # ###
######## ##### ##### ########
### ##### ##### ###
#### ##### ##### ####
## # ## ### ### ## # ##
# ######## ######## #
# ### ### ### ### #
### ##### #### #### ##### ###
####### ## # ## ## # ## #######
#### ######## ######## ####
##### ##### #### #### ##### #####
## # ####### # # ####### # ##
# ### #### # # #### ### #
###### # ## # # ## # ######
##### ## # ## ## # ## #####
### # # ## ## # # ###
## ## ### ## ## ### ## ##
## ## #### #### ## ##
# ### # #### #### # ### #
## ###### # ### # ###### ##
# # ### # # # ### # #
## # # # # ##
# #
## ##
## ##
# #
### ###
##### #####
## ### ### ##
# ### # # # # ### #
##### ## ## # # ## ## #####
######## # # ########
#### # # ####
##### ## ## #####
80 rows fetched in 0.0009s (4.4531s)

View query

array_to_string
# #
## ##
# #
## ##
# #
## # # ##
### ###
###### ######
#### ####
## ##
## ##
# #
## ##
## ##
# #
## ##
# #
## ##
# #
## ##
## ##
# #
## ##
# #
## ##
## ##
## ##
## ##
# #
## ##
# #
## ##
## ##
### # # ###
### ### ###
################################################################################
### ### ###
### # # ###
## ##
## ##
# #
## ##
# #
## ##
## ##
## ##
## ##
# #
## ##
# #
## ##
## ##
# #
## ##
# #
## ##
# #
## ##
## ##
# #
## ##
## ##
#### ####
###### ######
### ###
## # # ##
# #
## ##
# #
## ##
# #
80 rows fetched in 0.0009s (1.2188s)
Happy New Year!

Previous New Year posts:

Written by Quassnoi

December 31st, 2011 at 11:00 pm

Posted in Miscellaneous

16 Responses to 'Happy New Year!'

Subscribe to comments with RSS

  1. Quassnoi, your SQL skills truly are on another level. Well done mate, and hapy 2012!

    IamIC

    17 Jan 12 at 22:33

  2. @IamIC: thanks, and happy (late) New Year to you too!

    Quassnoi

    17 Jan 12 at 22:35

  3. Holy shit

    Michael

    29 Feb 12 at 02:05

  4. Um, we’re in March now! The Sun is shining and thoughts of winter are far removed from our northern-hemisphere conciousness.

    Next instalment please!

    laptop alias

    27 Mar 12 at 14:54

  5. *sighs* If I only had time for that!

    Quassnoi

    27 Mar 12 at 14:55

  6. Wow.
    I found your website through the stackoverflow website, I’ve never seen sql skills like it !

    Tom Taylor

    15 Apr 12 at 16:14

  7. I recently started working with PostgreSQL and randomly stumbled onto your site. I recognized your name from when I was active on StackOverflow. Just wanted to say hi. Also, the “snowflake test” will be added to my interviewing repertoire. You are a wizard.

    MeatballYang

    18 Jul 12 at 07:42

  8. @MeatballYang: hi! Are you going to make people actually draw the snowflakes on the interview? This one took 3 hours for me, with debugging.

    Quassnoi

    18 Jul 12 at 12:55

  9. Why?

    Roger

    11 Jan 13 at 02:19

  10. @Roger: because I can

    Quassnoi

    11 Jan 13 at 15:36

  11. Simply impressive!

    k0nsl

    1 Feb 13 at 05:50

  12. I thought that my calculation of a convex hull of inter-connected tables in a schema was some really geeky SQL:
    http://blog.jooq.org/2012/08/20/serious-sql-a-convex-hull-of-correlated-tables/

    But this beats everything :-) Happy New Year!

    Lukas Eder

    26 Dec 13 at 13:54

  13. I’ve never seen something so masochistic…

    Lo

    27 Dec 13 at 13:24

  14. @Lukas: happy new year to you too!

    Quassnoi

    27 Dec 13 at 14:52

  15. @Lo: [blushes] actually I enjoy doing this!

    Quassnoi

    27 Dec 13 at 14:52

  16. brilliant.

    Merlin Moncure

    16 Jan 14 at 23:03

Leave a Reply