<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>EXPLAIN EXTENDED &#187; Miscellaneous</title>
	<atom:link href="http://explainextended.com/category/miscellaneous/feed/" rel="self" type="application/rss+xml" />
	<link>http://explainextended.com</link>
	<description>How to create fast database queries</description>
	<lastBuildDate>Mon, 02 Jan 2012 00:31:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Happy New Year!</title>
		<link>http://explainextended.com/2011/12/31/happy-new-year-3/</link>
		<comments>http://explainextended.com/2011/12/31/happy-new-year-3/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 19:00:56 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5408</guid>
		<description><![CDATA[A New Year snowflake in PostgreSQL]]></description>
			<content:encoded><![CDATA[<p>This winter is anomalously warm in Europe, there is no snow and no New Year mood. So today we will be drawing a snowflake in <strong>PostgreSQL</strong>.</p>
<h3>#1. A little theory</h3>
<p>Core of a snowflake is six large symmetrical ice crystals growing from the common center. Out of these larger crystals other, smaller, crystals grow.</p>
<p>The overall shape of the snowflake is defined by how large do crystals grow and where exactly are they attached to each other.</p>
<p>These things are defined by fluctuations in air and temperature conditions around the snowflake. Because the flake itself is very small, in any given moment the conditions are nearly identical around each crystal, that&#8217;s why the offspring crystals pop up in almost same places and grow to almost same lengths. Different flakes, though, constantly move to and from each other and are subject to very different fluctuations, and that&#8217;s why they grow so unique.</p>
<p>Except for the root crystals (of which there are six), the child icicles grow in symmetrical pairs. More than that, each branch grows their own children (also in pairs), so on each step there are twice as many crystals, but they all share almost same length and angle. This gives the snowflake its symmetrical look.</p>
<p>So we can easily see that, despite the fact there may be many child crystals, the shape of a snowflake is defined by a relatively small number of parameters: how many children each crystal produces, where are they attached to it, at which angle they grow and to which length.</p>
<p>Now, let&#8217;s try to model it.<br />
<span id="more-5408"></span></p>
<h3>#2. Defining parameters</h3>
<p>To begin with, we will assume the length of the initial larger crystal as <strong>1</strong>, and its angle as <strong>0</strong>. Later we may easily scale and rotate the crystal.</p>
<p>There can be any number of child crystals (or, rather, pairs of child crystals), each pair attaching to its parent in a random place and growing at random angle. However, this places and angles, though random, are shared across the snowflake: if one pair grows here and at this angle, the similar pairs will grow on the same place and at the same angle on their parents.</p>
<p>This means that there are <code>6 * (2 ^ level)</code> twins of each crystal, where <code>level</code> is the number of its ancestors. There are 6 root crystals, 12 first-level crystals in the first first-level pairs (each growing in the same place at the same angle to the same length on root crystals), 12 first-level crystals in the second first-level pairs (again, sharing all parameters with themselves but not with the first pairs), 24 first-first grandchildren, 24 first-second grandchildren etc.</p>
<p>So, to define the snowflake shape, we need to define the number of pairs on each step, and for each pair define its place on the parent crystal, angle and length.</p>
<p>These parameters are random but have some constraints. Child crystals are shorter than their parents; they grow at sharp angles, and the number of pairs is limited. Since the parameters of each crystal depend on its parent, we would need a recursive query for that:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes
FROM    params;
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>cut</th>
<th>len</th>
<th>alpha</th>
<th>level</th>
<th>spikes</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0,895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr class="statusbar">
<td colspan="100">12 rows fetched in 0.0005s (0.0047s)</td>
</tr>
</table>
</div>
<p>Here, <code>id</code> is an array defining the <q>path</q> to the crystal pairs: <code>{0}</code> is the root branch, <code>{0, 0}</code> is the first child pair, <code>{0, 1, 0}</code> is the first child of the second child etc.</p>
<p><code>cut</code> defines the position where the crystals are attached to the parent: <code>0.356672627385706</code> in the second row means the first child will grow at <strong>35.6%</strong> of the root branches&#8217; length. </p>
<p><code>len</code> is the child length (in respect to the root branch, not immediate parent), <code>angle</code> is the angle at which they grow (in respect to the immediate parent), and <code>spikes</code> is the number of children pairs for the given crystal. <code>level</code>, I hope, is self-explanatory.</p>
<h3>#3. Defining crystal coordinates</h3>
<p>Now that we have the shape of our flake defined we need to build the actual coordinates for each crystal. To do this we again would need a recursive query.</p>
<p>First, we should make a set of branches on each level. We would start from a single root branch (it will be easy to clone it later) and generate a set of branches on each level. Each record in this set would correspond to an actual snowflake crystal:</p>
<pre class="brush: sql">
WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes, branch
FROM    tree;
</pre>
<p><a href="#" onclick="xcollapse('X0001');return false;">View query output</a><br />
</p>
<div id="X0001" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>cut</th>
<th>len</th>
<th>alpha</th>
<th>level</th>
<th>spikes</th>
<th>branch</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="int4">1</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0.895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0.895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr class="statusbar">
<td colspan="100">39 rows fetched in 0.0015s (0.0060s)</td>
</tr>
</table>
</div>
</div>
<p>This is just a copy of the parameters recordset with each parameter duplicated as many times as the crystal level requires. There is one root branch, two copies of level 1 branches, four copies of level 2 branches etc. Each instance is defined by the parameter <code>branch</code>.</p>
<p>To calculate coordinates of each crystal we need to traverse to it from the top.</p>
<p>The coordinates of the root branch are known: they are <code>(0, 0), (1, 0)</code> (by definition).</p>
<p>To build the first pair (two crystals, each <strong>28.5%</strong> long, growing at <strong>35.6%</strong> from the beginning of the root branch at angles <strong>51&deg; 18&#8242;</strong> and <strong>&minus;51&deg; 18&#8242;</strong>, respectively) we would need to take the coordinates of the parent, find the start point (it would be <code>(X<sub>s</sub> + (X<sub>e</sub> - X<sub>s</sub>) * cut, Y<sub>s</sub> + (Y<sub>e</sub> - Y<sub>s</sub>) * cut)</code> and then the coordinates of the end point (by adding <code>len * COS(&alpha;)</code> and <code>len * SIN(&alpha;)</code> to <code>X</code> and <code>Y</code> of the start point, respectively). <code>&alpha;</code> in this formula is in respect to the coordinate grid, not to the parent, and to find it we just need to sum all angles from the ancestors.</p>
<p>This is best done with another recursive query. On each step we should find immediate children of the current parent.</p>
<p>This can be easily achieved using the following join condition: <code>child.id > parent.id AND child.id <= parend.id || spikes AND ARRAY_LENGTH(child.level, 1) = ARRAY_LENGTH(parent.level, 1) + 1</code>. This condition employs <code>PostgreSQL</code> array arithmetics: if we have <code>parent.id = {0, 2, 1}</code> with <strong>3</strong> children (<code>spikes = 2</code>), then the condition would return <code>{0, 2, 1, 0}</code>, <code>{0, 2, 1, 1}</code> and <code>{0, 2, 1, 2}</code>. This hierarchy model is called <em>materialized path</em>, and some day I will write a post about it.</p>
<p>Since the <code>id</code> alone defines the parameters, not the instance, we need to add once more condition to find the instances. Crystals <strong>0</strong> and <strong>1</strong> would have child pairs <strong>0, 1</strong> and <strong>2, 3</strong>, respectively, so we'll include <code>child.branch BETWEEN p.branch * 2 AND child.branch * 2 + 1</code> into the join condition.</p>
<p>One more thing to do is to find whether we should add positive or negative angle. It's simple: even branches are negative, odd ones are positive. </p>
<p>And here's the query:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        )
SELECT  id::TEXT, xs, ys, xe, ye, alpha, branch, level
FROM    points
</pre>
<p><a href="#" onclick="xcollapse('X0002');return false;">View query output</a><br />
</p>
<div id="X0002" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>xs</th>
<th>ys</th>
<th>xe</th>
<th>ye</th>
<th>alpha</th>
<th>branch</th>
<th>level</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="float8">0</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0</td>
<td class="float8">0.535126474998426</td>
<td class="float8">-0.222870525550944</td>
<td class="float8">-0.895624824686616</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0</td>
<td class="float8">0.535126474998426</td>
<td class="float8">0.222870525550944</td>
<td class="float8">0.895624824686616</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0</td>
<td class="float8">0.700355670409353</td>
<td class="float8">-0.0987685898676881</td>
<td class="float8">-0.618425840960865</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0</td>
<td class="float8">0.700355670409353</td>
<td class="float8">0.0987685898676881</td>
<td class="float8">0.618425840960865</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0</td>
<td class="float8">0.746217182091292</td>
<td class="float8">-0.287103888547519</td>
<td class="float8">-0.81481895943341</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0</td>
<td class="float8">0.746217182091292</td>
<td class="float8">0.287103888547519</td>
<td class="float8">0.81481895943341</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">-0.14887523088396</td>
<td class="float8">0.469751413327762</td>
<td class="float8">-0.191410125558517</td>
<td class="float8">-1.71384852589547</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">-0.14887523088396</td>
<td class="float8">0.518723161661866</td>
<td class="float8">-0.152198135130716</td>
<td class="float8">-0.0774011234777593</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">0.14887523088396</td>
<td class="float8">0.518723161661866</td>
<td class="float8">0.152198135130716</td>
<td class="float8">0.0774011234777593</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">0.14887523088396</td>
<td class="float8">0.469751413327762</td>
<td class="float8">0.191410125558517</td>
<td class="float8">1.71384852589547</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">-0.0724716964610147</td>
<td class="float8">0.415814396363913</td>
<td class="float8">-0.0794846178607699</td>
<td class="float8">-1.41337132263673</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">-0.0724716964610147</td>
<td class="float8">0.421300943231765</td>
<td class="float8">-0.0750915048806874</td>
<td class="float8">-0.377878326736505</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">0.0724716964610147</td>
<td class="float8">0.421300943231765</td>
<td class="float8">0.0750915048806874</td>
<td class="float8">0.377878326736505</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">0.0724716964610147</td>
<td class="float8">0.415814396363913</td>
<td class="float8">0.0794846178607699</td>
<td class="float8">1.41337132263673</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">-0.0403595594574808</td>
<td class="float8">0.383956843885346</td>
<td class="float8">-0.0842237130189914</td>
<td class="float8">-1.68501348496485</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">-0.0403595594574808</td>
<td class="float8">0.432891699422156</td>
<td class="float8">-0.0450412628886793</td>
<td class="float8">-0.106236164408384</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">0.0403595594574808</td>
<td class="float8">0.432891699422156</td>
<td class="float8">0.0450412628886793</td>
<td class="float8">0.106236164408384</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">0.0403595594574808</td>
<td class="float8">0.383956843885346</td>
<td class="float8">0.0842237130189914</td>
<td class="float8">1.68501348496485</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">-0.0268937771770921</td>
<td class="float8">0.632033834423018</td>
<td class="float8">-0.0982578127634654</td>
<td class="float8">-1.14122677456394</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">-0.0268937771770921</td>
<td class="float8">0.677479105058146</td>
<td class="float8">-0.0343884926052126</td>
<td class="float8">-0.0956249073577922</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">0.0268937771770921</td>
<td class="float8">0.677479105058146</td>
<td class="float8">0.0343884926052126</td>
<td class="float8">0.0956249073577922</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">0.0268937771770921</td>
<td class="float8">0.632033834423018</td>
<td class="float8">0.0982578127634654</td>
<td class="float8">1.14122677456394</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">-0.0552654064199655</td>
<td class="float8">0.652751789427464</td>
<td class="float8">-0.11079307208949</td>
<td class="float8">-1.33168926015496</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">-0.0552654064199655</td>
<td class="float8">0.696112647679181</td>
<td class="float8">-0.0498532097163336</td>
<td class="float8">0.0948375782332311</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">0.0552654064199655</td>
<td class="float8">0.696112647679181</td>
<td class="float8">0.0498532097163336</td>
<td class="float8">-0.0948375782332311</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">0.0552654064199655</td>
<td class="float8">0.652751789427464</td>
<td class="float8">0.11079307208949</td>
<td class="float8">1.33168926015496</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">-0.0634816628858708</td>
<td class="float8">0.664985272342964</td>
<td class="float8">-0.146527482512268</td>
<td class="float8">-1.40118370142824</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">-0.0634816628858708</td>
<td class="float8">0.733882770016532</td>
<td class="float8">-0.0496981254523029</td>
<td class="float8">0.164332019506511</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">0.0634816628858708</td>
<td class="float8">0.733882770016532</td>
<td class="float8">0.0496981254523029</td>
<td class="float8">-0.164332019506511</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">0.0634816628858708</td>
<td class="float8">0.664985272342964</td>
<td class="float8">0.146527482512268</td>
<td class="float8">1.40118370142824</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">-0.00444672786521894</td>
<td class="float8">0.57360317341221</td>
<td class="float8">-0.0197700450702597</td>
<td class="float8">-1.20846495070043</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">-0.00444672786521894</td>
<td class="float8">0.584175304516377</td>
<td class="float8">-0.00491184713656396</td>
<td class="float8">-0.028386731221297</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">0.00444672786521894</td>
<td class="float8">0.584175304516377</td>
<td class="float8">0.00491184713656396</td>
<td class="float8">0.028386731221297</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">0.00444672786521894</td>
<td class="float8">0.57360317341221</td>
<td class="float8">0.0197700450702597</td>
<td class="float8">1.20846495070043</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">-0.0832182048548433</td>
<td class="float8">0.529680086603184</td>
<td class="float8">-0.25478987388562</td>
<td class="float8">-1.7115423972691</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">-0.0832182048548433</td>
<td class="float8">0.726693128455993</td>
<td class="float8">-0.0690412356340923</td>
<td class="float8">0.0819044784022824</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">0.0832182048548433</td>
<td class="float8">0.726693128455993</td>
<td class="float8">0.0690412356340923</td>
<td class="float8">-0.0819044784022824</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">0.0832182048548433</td>
<td class="float8">0.529680086603184</td>
<td class="float8">0.25478987388562</td>
<td class="float8">1.7115423972691</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr class="statusbar">
<td colspan="100">39 rows fetched in 0.0017s (0.0100s)</td>
</tr>
</table>
</div>
</div>
<p>Now, we have coordinates for each crystal.</p>
<h3>#4. Visualizing</h3>
<p>The most tedious part about <strong>SQL</strong> graphics is visualizing them. To do this, we'll employ <strong>PostgreSQL</strong>'s geometrical functions.</p>
<p>Each crystal can be represented as a path between its start and end points. This can be used by constructing a line segment (<code>LSEG(POINT, POINT)</code>) using two point constructors (<code>POINT(DOUBLE PRECISION, DOUBLE PRECISION)</code>) and converting it to a path. Unfortunately, <strong>PostgreSQL</strong> does not allow direct <code>lseg</code> to <code>path</code> conversion but the latter can be easily constructed from the <code>TEXT</code> representation of an <code>lseg</code>.</p>
<p>We have six root branches so each crystal should be cloned to make six copies. It's easiest to make it by using <code>PATH * POINT</code> operator: it rotates and scales the <code>PATH</code> around <code>(0, 0)</code> so that <code>(1, 0)</code> becomes <code>POINT</code>. To construct the points, we will generate six rotation angles with step of <strong>60&deg;</strong> and will multiply the path by <code>POINT(COS(&alpha;), SIN(&alpha;))</code>. These multiplications will preserve lengths.</p>
<p>Finally we need to actually display the snowflake on the screen. To do this, we will generate a set of <strong>80 &times; 80</strong> records(<code>x</code> and <code>y</code>), defining the grid from <code>(-1, -1)</code> to <code>(1, 1)</code> with step of <code>1/40</code> units. Then we'll see if there is at least one crystal within distance of <code>1/40</code> units from each cell on the grid (using <code>POINT <-> PATH</code> distance operator and <code>EXISTS</code>). If there is, we will return a number sign (<code>#</code>) for this cell, otherwise a space.</p>
<p>Then we'll group the cells by lines (<code>y</code>), concatenate columns (using <code>ARRAY_AGG</code> and <code>ARRAY_TO_STRING</code>) and output the lines.</p>
<h3>#5. The snowflake</h3>
<p>And here's our snowflake:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                     ##            #         #            ##                    </td>
</tr>
<tr>
<td class="text">                      #            #         #            #                     </td>
</tr>
<tr>
<td class="text">                      ##          ##         ##          ##                     </td>
</tr>
<tr>
<td class="text">                       ##         #           #         ##                      </td>
</tr>
<tr>
<td class="text">                        #         #           #         #                       </td>
</tr>
<tr>
<td class="text">                        ## ##    ##           ##    ## ##                       </td>
</tr>
<tr>
<td class="text">                         # ########           ######## #                        </td>
</tr>
<tr>
<td class="text">                        ### #### #             # #### ###                       </td>
</tr>
<tr>
<td class="text">                       ## ###### #    #   #    # ###### ##                      </td>
</tr>
<tr>
<td class="text">                       ###########  ###   ###  ###########                      </td>
</tr>
<tr>
<td class="text">                      ################     ################                     </td>
</tr>
<tr>
<td class="text">                       ####### ######       ###### #######                      </td>
</tr>
<tr>
<td class="text">               ####  ######### ##  ###     ###  ## #########  ####              </td>
</tr>
<tr>
<td class="text">                 ######  ## ## #   ##       ##   # ## ##  ######                </td>
</tr>
<tr>
<td class="text">                     ####### ###   #         #   ### #######                    </td>
</tr>
<tr>
<td class="text">                         #######  ##         ##  #######                        </td>
</tr>
<tr>
<td class="text">                       ###    ## ###         ### ##    ###                      </td>
</tr>
<tr>
<td class="text">                      ####     #####         #####     ####                     </td>
</tr>
<tr>
<td class="text">                     ######   # ###           ### #   ######                    </td>
</tr>
<tr>
<td class="text">                    ##  ##########             ##########  ##                   </td>
</tr>
<tr>
<td class="text">          #             #    #####             #####    #             #         </td>
</tr>
<tr>
<td class="text">          ##                     ##           ##                     ##         </td>
</tr>
<tr>
<td class="text">           ##      #              ##         ##              #      ##          </td>
</tr>
<tr>
<td class="text">            ##    ##              ##         ##              ##    ##           </td>
</tr>
<tr>
<td class="text">             ##   ####             ##       ##             ####   ##            </td>
</tr>
<tr>
<td class="text">              ##  ####              #       #              ####  ##             </td>
</tr>
<tr>
<td class="text">             #### ####              ##     ##              #### ####            </td>
</tr>
<tr>
<td class="text">             ## ###  ##              #     #              ##  ### ##            </td>
</tr>
<tr>
<td class="text">            #### ##   ##             ##   ##             ##   ## ####           </td>
</tr>
<tr>
<td class="text">           #########   ###            ## ##            ###   #########          </td>
</tr>
<tr>
<td class="text">          #######  ##  ###             # #             ###  ##  #######         </td>
</tr>
<tr>
<td class="text">             #####  #   ##             ###             ##   #  #####            </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">             #####  #   ##             ###             ##   #  #####            </td>
</tr>
<tr>
<td class="text">          #######  ##  ###             # #             ###  ##  #######         </td>
</tr>
<tr>
<td class="text">           #########   ###            ## ##            ###   #########          </td>
</tr>
<tr>
<td class="text">            #### ##   ##             ##   ##             ##   ## ####           </td>
</tr>
<tr>
<td class="text">             ## ###  ##              #     #              ##  ### ##            </td>
</tr>
<tr>
<td class="text">             #### ####              ##     ##              #### ####            </td>
</tr>
<tr>
<td class="text">              ##  ####              #       #              ####  ##             </td>
</tr>
<tr>
<td class="text">             ##   ####             ##       ##             ####   ##            </td>
</tr>
<tr>
<td class="text">            ##    ##              ##         ##              ##    ##           </td>
</tr>
<tr>
<td class="text">           ##      #              ##         ##              #      ##          </td>
</tr>
<tr>
<td class="text">          ##                     ##           ##                     ##         </td>
</tr>
<tr>
<td class="text">          #             #    #####             #####    #             #         </td>
</tr>
<tr>
<td class="text">                    ##  ##########             ##########  ##                   </td>
</tr>
<tr>
<td class="text">                     ######   # ###           ### #   ######                    </td>
</tr>
<tr>
<td class="text">                      ####     #####         #####     ####                     </td>
</tr>
<tr>
<td class="text">                       ###    ## ###         ### ##    ###                      </td>
</tr>
<tr>
<td class="text">                         #######  ##         ##  #######                        </td>
</tr>
<tr>
<td class="text">                     ####### ###   #         #   ### #######                    </td>
</tr>
<tr>
<td class="text">                 ######  ## ## #   ##       ##   # ## ##  ######                </td>
</tr>
<tr>
<td class="text">               ####  ######### ##  ###     ###  ## #########  ####              </td>
</tr>
<tr>
<td class="text">                       ####### ######       ###### #######                      </td>
</tr>
<tr>
<td class="text">                      ################     ################                     </td>
</tr>
<tr>
<td class="text">                       ###########  ###   ###  ###########                      </td>
</tr>
<tr>
<td class="text">                       ## ###### #    #   #    # ###### ##                      </td>
</tr>
<tr>
<td class="text">                        ### #### #             # #### ###                       </td>
</tr>
<tr>
<td class="text">                         # ########           ######## #                        </td>
</tr>
<tr>
<td class="text">                        ## ##    ##           ##    ## ##                       </td>
</tr>
<tr>
<td class="text">                        #         #           #         #                       </td>
</tr>
<tr>
<td class="text">                       ##         #           #         ##                      </td>
</tr>
<tr>
<td class="text">                      ##          ##         ##          ##                     </td>
</tr>
<tr>
<td class="text">                      #            #         #            #                     </td>
</tr>
<tr>
<td class="text">                     ##            #         #            ##                    </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0010s (2.7344s)</td>
</tr>
</table>
</div>
<h3>#6. Some more unique snowflakes</h3>
<p><a href="#" onclick="xcollapse('X0003');return false;">View query</a><br />
</p>
<div id="X0003" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112311);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                           ####                   ####                          </td>
</tr>
<tr>
<td class="text">                            ######             ######                           </td>
</tr>
<tr>
<td class="text">                            ####                 ####                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                    #       #                       #       #                   </td>
</tr>
<tr>
<td class="text">                    ##     ##                       ##     ##                   </td>
</tr>
<tr>
<td class="text">                     #     ##                       ##     #                    </td>
</tr>
<tr>
<td class="text">           #         ##    #                         #    ##         #          </td>
</tr>
<tr>
<td class="text">           ##         #   ##                         ##   #         ##          </td>
</tr>
<tr>
<td class="text">         ## ##        ##  ##                         ##  ##        ## ##        </td>
</tr>
<tr>
<td class="text">         ######        ## #                           # ##        ######        </td>
</tr>
<tr>
<td class="text">           ########     # #                           # #     ########          </td>
</tr>
<tr>
<td class="text">          ##     ###### ###                           ### ######     ##         </td>
</tr>
<tr>
<td class="text">          #          ######            ###            ######          #         </td>
</tr>
<tr>
<td class="text">                         ##            ###            ##                        </td>
</tr>
<tr>
<td class="text">                          #             #             #                         </td>
</tr>
<tr>
<td class="text">                          ##           ###           ##                         </td>
</tr>
<tr>
<td class="text">                           ##      #   ###   #      ##                          </td>
</tr>
<tr>
<td class="text">                            #      ## ##### ##      #                           </td>
</tr>
<tr>
<td class="text">                            ##    ###  ###  ###    ##                           </td>
</tr>
<tr>
<td class="text">                             #     ##### #####     #                            </td>
</tr>
<tr>
<td class="text">                             ##    ####   ####    ##                            </td>
</tr>
<tr>
<td class="text">                              #    ##### #####    #                             </td>
</tr>
<tr>
<td class="text">                              ##   ###########   ##                             </td>
</tr>
<tr>
<td class="text">                       ####    ##  ###########  ##    ####                      </td>
</tr>
<tr>
<td class="text">                  #     ####    #   #### ####   #    ####     #                 </td>
</tr>
<tr>
<td class="text">  #              ###     ####   ##  #### ####  ##   ####     ###              # </td>
</tr>
<tr>
<td class="text">  #              #######  ####   #  ## # # ##  #   ####  #######              # </td>
</tr>
<tr>
<td class="text"># #                 ######## ##  ## #  ###  # ##  ## ########                 # </td>
</tr>
<tr>
<td class="text">###                  ##  #######  ###  ###  ###  #######  ##                  ##</td>
</tr>
<tr>
<td class="text">###                  ###  ############# # #############  ###                  ##</td>
</tr>
<tr>
<td class="text">  ##                   ## ###### ############### ###### ##                   ## </td>
</tr>
<tr>
<td class="text">   ##             #### #####  ## ## # ## ## # ## ##  ##### ####             ##  </td>
</tr>
<tr>
<td class="text">    ##              #########  #  #############  #  #########              ##   </td>
</tr>
<tr>
<td class="text">     ##             #### ######### ########### ######### ####             ##    </td>
</tr>
<tr>
<td class="text">      ##              #######  ########   ########  #######              ##     </td>
</tr>
<tr>
<td class="text">       ##                ####    ####### #######    ####                ##      </td>
</tr>
<tr>
<td class="text">        ##                  ########   # #   ########                  ##       </td>
</tr>
<tr>
<td class="text">         ##                  ##  ###   ###   ###  ##                  ##        </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">         ##                  ##  ###   ###   ###  ##                  ##        </td>
</tr>
<tr>
<td class="text">        ##                  ########   # #   ########                  ##       </td>
</tr>
<tr>
<td class="text">       ##                ####    ####### #######    ####                ##      </td>
</tr>
<tr>
<td class="text">      ##              #######  ########   ########  #######              ##     </td>
</tr>
<tr>
<td class="text">     ##             #### ######### ########### ######### ####             ##    </td>
</tr>
<tr>
<td class="text">    ##              #########  #  #############  #  #########              ##   </td>
</tr>
<tr>
<td class="text">   ##             #### #####  ## ## # ## ## # ## ##  ##### ####             ##  </td>
</tr>
<tr>
<td class="text">  ##                   ## ###### ############### ###### ##                   ## </td>
</tr>
<tr>
<td class="text">###                  ###  ############# # #############  ###                  ##</td>
</tr>
<tr>
<td class="text">###                  ##  #######  ###  ###  ###  #######  ##                  ##</td>
</tr>
<tr>
<td class="text"># #                 ######## ##  ## #  ###  # ##  ## ########                 # </td>
</tr>
<tr>
<td class="text">  #              #######  ####   #  ## # # ##  #   ####  #######              # </td>
</tr>
<tr>
<td class="text">  #              ###     ####   ##  #### ####  ##   ####     ###              # </td>
</tr>
<tr>
<td class="text">                  #     ####    #   #### ####   #    ####     #                 </td>
</tr>
<tr>
<td class="text">                       ####    ##  ###########  ##    ####                      </td>
</tr>
<tr>
<td class="text">                              ##   ###########   ##                             </td>
</tr>
<tr>
<td class="text">                              #    ##### #####    #                             </td>
</tr>
<tr>
<td class="text">                             ##    ####   ####    ##                            </td>
</tr>
<tr>
<td class="text">                             #     ##### #####     #                            </td>
</tr>
<tr>
<td class="text">                            ##    ###  ###  ###    ##                           </td>
</tr>
<tr>
<td class="text">                            #      ## ##### ##      #                           </td>
</tr>
<tr>
<td class="text">                           ##      #   ###   #      ##                          </td>
</tr>
<tr>
<td class="text">                          ##           ###           ##                         </td>
</tr>
<tr>
<td class="text">                          #             #             #                         </td>
</tr>
<tr>
<td class="text">                         ##            ###            ##                        </td>
</tr>
<tr>
<td class="text">          #          ######            ###            ######          #         </td>
</tr>
<tr>
<td class="text">          ##     ###### ###                           ### ######     ##         </td>
</tr>
<tr>
<td class="text">           ########     # #                           # #     ########          </td>
</tr>
<tr>
<td class="text">         ######        ## #                           # ##        ######        </td>
</tr>
<tr>
<td class="text">         ## ##        ##  ##                         ##  ##        ## ##        </td>
</tr>
<tr>
<td class="text">           ##         #   ##                         ##   #         ##          </td>
</tr>
<tr>
<td class="text">           #         ##    #                         #    ##         #          </td>
</tr>
<tr>
<td class="text">                     #     ##                       ##     #                    </td>
</tr>
<tr>
<td class="text">                    ##     ##                       ##     ##                   </td>
</tr>
<tr>
<td class="text">                    #       #                       #       #                   </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            ####                 ####                           </td>
</tr>
<tr>
<td class="text">                            ######             ######                           </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0010s (2.0781s)</td>
</tr>
</table>
</div>
<p><a href="#" onclick="xcollapse('X0004');return false;">View query</a><br />
</p>
<div id="X0004" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112312);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">        ## ##        ##                                   ##        ## ##       </td>
</tr>
<tr>
<td class="text">         #####       ##                                   ##       #####        </td>
</tr>
<tr>
<td class="text">          ####        #                                   #        ####         </td>
</tr>
<tr>
<td class="text">       ########       #                                   #       ########      </td>
</tr>
<tr>
<td class="text">    ##### ##  ##      #                                   #      ##  ## #####   </td>
</tr>
<tr>
<td class="text">    #          ###  # #                                   # #  ###          #   </td>
</tr>
<tr>
<td class="text">                 ## ###                                   ### ##                </td>
</tr>
<tr>
<td class="text">                  #####                                   #####                 </td>
</tr>
<tr>
<td class="text">                    ###                                   ###                   </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                      ##                                 ##                     </td>
</tr>
<tr>
<td class="text">                       ##                               ##                      </td>
</tr>
<tr>
<td class="text">                        #                               #                       </td>
</tr>
<tr>
<td class="text">                        ##     #    #       #    #     ##                       </td>
</tr>
<tr>
<td class="text">                         #   # ###  #   #   #  ### #   #                        </td>
</tr>
<tr>
<td class="text">                         ##  ###### #  ###  # ######  ##                        </td>
</tr>
<tr>
<td class="text">                          #   ### # #### #### # ###   #                         </td>
</tr>
<tr>
<td class="text">                          ##   ## ####     #### ##   ##                         </td>
</tr>
<tr>
<td class="text">                       ##  ## ###  ##       ##  ### ##  ##                      </td>
</tr>
<tr>
<td class="text">                      ###   #  #   ##       ##   #  #   ###                     </td>
</tr>
<tr>
<td class="text">                     #####  ## #   ##       ##   # ##  #####                    </td>
</tr>
<tr>
<td class="text">                     ######  # ##  #         #  ## #  ######                    </td>
</tr>
<tr>
<td class="text">                      #  ### ####  #         #  #### ###  #                     </td>
</tr>
<tr>
<td class="text">                  ##  #    ####### #         # #######    #  ##                 </td>
</tr>
<tr>
<td class="text">                   #####     ##### ####   #### #####     #####                  </td>
</tr>
<tr>
<td class="text">                     ####    ########       ########    ####                    </td>
</tr>
<tr>
<td class="text">                    #######  ## # ##         ## # ##  #######                   </td>
</tr>
<tr>
<td class="text">                  ###     ##### ####         #### #####     ###                 </td>
</tr>
<tr>
<td class="text">                  #          ### ###         ### ###          #                 </td>
</tr>
<tr>
<td class="text">                  #        ########           ########        #                 </td>
</tr>
<tr>
<td class="text">               ## #        ##    ###         ###    ##        # ##              </td>
</tr>
<tr>
<td class="text">                ####              #####   #####              ####               </td>
</tr>
<tr>
<td class="text">                 ###              #####   #####              ###                </td>
</tr>
<tr>
<td class="text">             ########             #####   #####             ########            </td>
</tr>
<tr>
<td class="text">             ###  # ##  ##        ####     ####        ##  ## #  ###            </td>
</tr>
<tr>
<td class="text">             ###     ##  #           #     #           #  ##     ###            </td>
</tr>
<tr>
<td class="text">             #####    ####           ##   ##           ####    #####            </td>
</tr>
<tr>
<td class="text">             ######     ##            ## ##            ##     ######            </td>
</tr>
<tr>
<td class="text">##                #########    ##      # #      ##    #########                #</td>
</tr>
<tr>
<td class="text"> ###                ###   ### ####     ###     #### ###   ###                ###</td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text"> ###                ###   ### ####     ###     #### ###   ###                ###</td>
</tr>
<tr>
<td class="text">##                #########    ##      # #      ##    #########                #</td>
</tr>
<tr>
<td class="text">             ######     ##            ## ##            ##     ######            </td>
</tr>
<tr>
<td class="text">             #####    ####           ##   ##           ####    #####            </td>
</tr>
<tr>
<td class="text">             ###     ##  #           #     #           #  ##     ###            </td>
</tr>
<tr>
<td class="text">             ###  # ##  ##        ####     ####        ##  ## #  ###            </td>
</tr>
<tr>
<td class="text">             ########             #####   #####             ########            </td>
</tr>
<tr>
<td class="text">                 ###              #####   #####              ###                </td>
</tr>
<tr>
<td class="text">                ####              #####   #####              ####               </td>
</tr>
<tr>
<td class="text">               ## #        ##    ###         ###    ##        # ##              </td>
</tr>
<tr>
<td class="text">                  #        ########           ########        #                 </td>
</tr>
<tr>
<td class="text">                  #          ### ###         ### ###          #                 </td>
</tr>
<tr>
<td class="text">                  ###     ##### ####         #### #####     ###                 </td>
</tr>
<tr>
<td class="text">                    #######  ## # ##         ## # ##  #######                   </td>
</tr>
<tr>
<td class="text">                     ####    ########       ########    ####                    </td>
</tr>
<tr>
<td class="text">                   #####     ##### ####   #### #####     #####                  </td>
</tr>
<tr>
<td class="text">                  ##  #    ####### #         # #######    #  ##                 </td>
</tr>
<tr>
<td class="text">                      #  ### ####  #         #  #### ###  #                     </td>
</tr>
<tr>
<td class="text">                     ######  # ##  #         #  ## #  ######                    </td>
</tr>
<tr>
<td class="text">                     #####  ## #   ##       ##   # ##  #####                    </td>
</tr>
<tr>
<td class="text">                      ###   #  #   ##       ##   #  #   ###                     </td>
</tr>
<tr>
<td class="text">                       ##  ## ###  ##       ##  ### ##  ##                      </td>
</tr>
<tr>
<td class="text">                          ##   ## ####     #### ##   ##                         </td>
</tr>
<tr>
<td class="text">                          #   ### # #### #### # ###   #                         </td>
</tr>
<tr>
<td class="text">                         ##  ###### #  ###  # ######  ##                        </td>
</tr>
<tr>
<td class="text">                         #   # ###  #   #   #  ### #   #                        </td>
</tr>
<tr>
<td class="text">                        ##     #    #       #    #     ##                       </td>
</tr>
<tr>
<td class="text">                        #                               #                       </td>
</tr>
<tr>
<td class="text">                       ##                               ##                      </td>
</tr>
<tr>
<td class="text">                      ##                                 ##                     </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                    ###                                   ###                   </td>
</tr>
<tr>
<td class="text">                  #####                                   #####                 </td>
</tr>
<tr>
<td class="text">                 ## ###                                   ### ##                </td>
</tr>
<tr>
<td class="text">    #          ###  # #                                   # #  ###          #   </td>
</tr>
<tr>
<td class="text">    ##### ##  ##      #                                   #      ##  ## #####   </td>
</tr>
<tr>
<td class="text">       ########       #                                   #       ########      </td>
</tr>
<tr>
<td class="text">          ####        #                                   #        ####         </td>
</tr>
<tr>
<td class="text">         #####       ##                                   ##       #####        </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0009s (4.4531s)</td>
</tr>
</table>
</div>
<p><a href="#" onclick="xcollapse('X0005');return false;">View query</a><br />
</p>
<div id="X0005" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112313);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                     ##                                   ##                    </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                      ## #                             # ##                     </td>
</tr>
<tr>
<td class="text">                       ###                             ###                      </td>
</tr>
<tr>
<td class="text">                     ######                           ######                    </td>
</tr>
<tr>
<td class="text">                      ####                             ####                     </td>
</tr>
<tr>
<td class="text">                        ##                             ##                       </td>
</tr>
<tr>
<td class="text">                         ##                           ##                        </td>
</tr>
<tr>
<td class="text">                          #                           #                         </td>
</tr>
<tr>
<td class="text">                          ##                         ##                         </td>
</tr>
<tr>
<td class="text">                           ##                       ##                          </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                             #                     #                            </td>
</tr>
<tr>
<td class="text">                             ##                   ##                            </td>
</tr>
<tr>
<td class="text">                              #                   #                             </td>
</tr>
<tr>
<td class="text">                              ##                 ##                             </td>
</tr>
<tr>
<td class="text">                               ##               ##                              </td>
</tr>
<tr>
<td class="text">                                #               #                               </td>
</tr>
<tr>
<td class="text">                                ##             ##                               </td>
</tr>
<tr>
<td class="text">                                 #             #                                </td>
</tr>
<tr>
<td class="text">                                 ##           ##                                </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                   ##       ##                                  </td>
</tr>
<tr>
<td class="text">                                    #       #                                   </td>
</tr>
<tr>
<td class="text">                                    ##     ##                                   </td>
</tr>
<tr>
<td class="text">                                     #     #                                    </td>
</tr>
<tr>
<td class="text">                                     ##   ##                                    </td>
</tr>
<tr>
<td class="text">                                      ## ##                                     </td>
</tr>
<tr>
<td class="text">      ###                              # #                              ###     </td>
</tr>
<tr>
<td class="text">       ###                             ###                             ###      </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">       ###                             ###                             ###      </td>
</tr>
<tr>
<td class="text">      ###                              # #                              ###     </td>
</tr>
<tr>
<td class="text">                                      ## ##                                     </td>
</tr>
<tr>
<td class="text">                                     ##   ##                                    </td>
</tr>
<tr>
<td class="text">                                     #     #                                    </td>
</tr>
<tr>
<td class="text">                                    ##     ##                                   </td>
</tr>
<tr>
<td class="text">                                    #       #                                   </td>
</tr>
<tr>
<td class="text">                                   ##       ##                                  </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                 ##           ##                                </td>
</tr>
<tr>
<td class="text">                                 #             #                                </td>
</tr>
<tr>
<td class="text">                                ##             ##                               </td>
</tr>
<tr>
<td class="text">                                #               #                               </td>
</tr>
<tr>
<td class="text">                               ##               ##                              </td>
</tr>
<tr>
<td class="text">                              ##                 ##                             </td>
</tr>
<tr>
<td class="text">                              #                   #                             </td>
</tr>
<tr>
<td class="text">                             ##                   ##                            </td>
</tr>
<tr>
<td class="text">                             #                     #                            </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                           ##                       ##                          </td>
</tr>
<tr>
<td class="text">                          ##                         ##                         </td>
</tr>
<tr>
<td class="text">                          #                           #                         </td>
</tr>
<tr>
<td class="text">                         ##                           ##                        </td>
</tr>
<tr>
<td class="text">                        ##                             ##                       </td>
</tr>
<tr>
<td class="text">                      ####                             ####                     </td>
</tr>
<tr>
<td class="text">                     ######                           ######                    </td>
</tr>
<tr>
<td class="text">                       ###                             ###                      </td>
</tr>
<tr>
<td class="text">                      ## #                             # ##                     </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                     ##                                   ##                    </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0009s (1.2188s)</td>
</tr>
</table>
</div>
<div class="plainnote" style="text-align: center">
<big><strong>Happy New Year!</strong></big>
</div>
<p>Previous New Year posts:</p>
<ul>
<li><a href="/2009/12/31/happy-new-year/">Happy New 2010 Year!</a></li>
<li><a href="/2010/12/31/happy-new-year/">Happy New 2011 Year!</a></li>
</ul>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2011/12/31/happy-new-year-3/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/12/31/happy-new-year-3/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Shared Plan and Algorithm Network Cache (SPANC)</title>
		<link>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/</link>
		<comments>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/#comments</comments>
		<pubDate>Fri, 01 Apr 2011 19:00:49 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5314</guid>
		<description><![CDATA[Analysis of the various RDBMS has proven that the systems since recently do exchange the query plans through a distributed storage engine called Shared Plan and Algorithm Network Cache (SPANC) and use StackOverflow as an external optimization engine as well.]]></description>
			<content:encoded><![CDATA[<p>Due to the nature of my work I have to deal with various database systems.</p>
<p>While <strong>SQL </strong> is more or less standardized, the optimizers are implemented differently in the different systems. Some systems cannot join tables with anything other than nested loops, the other can only <code>GROUP BY</code> using a sort, etc.</p>
<p>So when you write a join in, say, <strong>MySQL</strong>, you cannot expect it to be a sort merge join (and you should consider this fact when designing the query). Or, when you write a <code>DISTINCT</code> in <strong>SQL Server</strong>, you can&#8217;t expect a loose index scan. These are limitations put by their optimizers.</p>
<p>However, in the last three months I noticed a great improvement in the queries where I could not expect any.</p>
<p>It started when I tried to debug this in <strong>SQL Server</strong>:</p>
<pre class="brush: sql">
SELECT  DISTINCT order
FROM    orderItem
</pre>
<p>while yielded this plan:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2011/04/sql-server-query-plan.png" alt="" title="SQL Server SPANC query plan" class="aligncenter size-full wp-image-5346 noborder" /></p>
<p>Similar results were obtained on <strong>Oracle</strong>:</p>
<pre>
Plan hash value: 1345318323

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                                         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                                  |             |       |   200 |     2  (50)| 00:00:01 |
|   1 |  REMOTE SPANC QUERY (SQLSERVER, MYSQL, POSTGRESQL, STACKOVERFLOW) |             |       |   200 |     2  (50)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------------------
</pre>
<p>, <strong>MySQL</strong>:</p>
<pre>
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra                                               |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
|  1 | SIMPLE      | orderItem | spanc | NULL          | ALL     | NULL    | NULL |         | Using Oracle, PostgreSQL, SQL Server, StackOverflow |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
</pre>
<p>and <strong>PostgreSQL</strong>:</p>
<pre>
Seq Scan on OrderItem  (cost=0.00..6.44 width=4)
 -> Remote Scan on SPANC (Oracle, MySQL, SQL Server, StackOverflow)   (cost=0.00..100.00 width=4)
</pre>
<p>Network analysis has shown weird encrypted activity between the servers in my internal network which host <strong>SQL Server</strong>, <strong>Oracle</strong>, <strong>PostgreSQL</strong> and <strong>MySQL</strong> servers.</p>
<p>Ultimately, there was unencrypted activity outside of the internal network which turned out to be an HTTP <code>POST</code> request followed by several <code>GET</code> polls to <a href="http://stackoverflow.com/questions/5518080/distinct-optimization">http://stackoverflow.com/questions/5518080/distinct-optimization</a>.</p>
<p>It seems that the developers of major database systems agreed to share the knowledge about the most efficient query plans in some kind of a distributed storage (which probably is called <strong>SPANC</strong> as we can see in the query plans) and provide an interface to access each other&#8217;s systems.</p>
<p>It also seems that these systems treat <a href="http://stackoverflow.com"><strong>Stack Overflow</strong></a> as an external optimization engine where the most experienced developers can build their plans for them in a most efficient way.</p>
<p>I would be glad to have further clarification from the companies staff.</p>
<p>This also begs a question: how many of regular <strong>Stack Overflow</strong> participants are in fact query engines disguised as curious fellow developers?</p>
<p>It would be definitely nice to know.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Things SQL needs: SERIES()</title>
		<link>http://explainextended.com/2011/02/18/things-sql-needs-series/</link>
		<comments>http://explainextended.com/2011/02/18/things-sql-needs-series/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 20:00:47 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5239</guid>
		<description><![CDATA[A window function (or an extension for <code>GROUP BY</code> clause) that would allow grouping continuous series of an expression in an ordered dataset would ease writing queries that need to aggregate such series, make them more efficient and even improve certain kinds of grouping queries where the grouping expression shares its order with one of the indexes.]]></description>
			<content:encoded><![CDATA[<p>Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.</p>
<p><img src="http://explainextended.com/wp-content/uploads/2011/02/135779270_de2e30d0b1_z.jpg" alt="" title="Pawns" width="640" height="480" class="aligncenter size-full wp-image-5271 noborder" /></p>
<p>I believe this could be best illustrated with an example:</p>
<table class="excel">
<tr>
<th>id</th>
<th>source</th>
<th>value</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">1</td>
<td class="int4">10</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">1</td>
<td class="int4">20</td>
</tr>
<tr class="rowbreak">
<td class="int4">3</td>
<td class="int4">2</td>
<td class="int4">15</td>
</tr>
<tr>
<td class="int4">4</td>
<td class="int4">2</td>
<td class="int4">25</td>
</tr>
<tr class="rowbreak">
<td class="int4">5</td>
<td class="int4">1</td>
<td class="int4">45</td>
</tr>
<tr class="rowbreak">
<td class="int4">6</td>
<td class="int4">3</td>
<td class="int4">50</td>
</tr>
<tr>
<td class="int4">7</td>
<td class="int4">3</td>
<td class="int4">35</td>
</tr>
<tr class="rowbreak">
<td class="int4">8</td>
<td class="int4">1</td>
<td class="int4">40</td>
</tr>
<tr>
<td class="int4">9</td>
<td class="int4">1</td>
<td class="int4">10</td>
</tr>
</table>
<p>The records are ordered by <code>id</code>, and within this order there are continuous series of records which share the same value of <code>source</code>. In the table above, the series are separated by thick lines.</p>
<p>We want to calculate some aggregates across each of the series: <code>MIN</code>, <code>MAX</code>, <code>SUM</code>, <code>AVG</code>, whatever:</p>
<table class="excel">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">20</td>
<td class="int8">30</td>
<td class="numeric">15.00</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">15</td>
<td class="int4">25</td>
<td class="int8">40</td>
<td class="numeric">20.00</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">45</td>
<td class="int4">45</td>
<td class="int8">45</td>
<td class="numeric">45.00</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">35</td>
<td class="int4">50</td>
<td class="int8">85</td>
<td class="numeric">42.50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">40</td>
<td class="int8">50</td>
<td class="numeric">25.00</td>
</tr>
</table>
<p>This can be used for different things. I used that for:</p>
<ul>
<li>Reading sensors from a moving elevator (thus tracking its position)</li>
<li>Recording user&#8217;s activity on a site</li>
<li>Tracking the primary node in a server cluster</li>
</ul>
<p>, but almost any seasoned database developer can recall a need for such a query.</p>
<p>As you can see, the values of <code>source</code> are repeating so a mere <code>GROUP BY</code> won&#8217;t work here.</p>
<p>In the systems supporting window functions there is a workaround for that:</p>
<p><span id="more-5239"></span></p>
<pre class="brush: sql">
SELECT  source, MIN(value), MAX(value), SUM(value), AVG(value)::NUMERIC(20, 2)
FROM    (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY source ORDER BY id) AS rno,
                ROW_NUMBER() OVER (ORDER BY id) AS rne
        FROM    (
                VALUES
                (1, 1, 10),
                (2, 1, 20),
                (3, 2, 15),
                (4, 2, 25),
                (5, 1, 45),
                (6, 3, 50),
                (7, 3, 35),
                (8, 1, 40),
                (9, 1, 10)
                ) series (id, source, value)
        ) q
GROUP BY
        source, rne - rno
ORDER BY
        MIN(id)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">20</td>
<td class="int8">30</td>
<td class="numeric">15.00</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">15</td>
<td class="int4">25</td>
<td class="int8">40</td>
<td class="numeric">20.00</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">45</td>
<td class="int4">45</td>
<td class="int8">45</td>
<td class="numeric">45.00</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">35</td>
<td class="int4">50</td>
<td class="int8">85</td>
<td class="numeric">42.50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">40</td>
<td class="int8">50</td>
<td class="numeric">25.00</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0004s (0.0033s)</td>
</tr>
</table>
</div>
<pre>
Sort  (cost=1.40..1.42 rows=9 width=28)
  Sort Key: (min(q.id))
  -&gt;  HashAggregate  (cost=1.01..1.25 rows=9 width=28)
        -&gt;  Subquery Scan q  (cost=0.58..0.85 rows=9 width=28)
              -&gt;  WindowAgg  (cost=0.58..0.74 rows=9 width=12)
                    -&gt;  Sort  (cost=0.58..0.60 rows=9 width=12)
                          Sort Key: &quot;*VALUES*&quot;.column1
                          -&gt;  WindowAgg  (cost=0.26..0.44 rows=9 width=12)
                                -&gt;  Sort  (cost=0.26..0.28 rows=9 width=12)
                                      Sort Key: &quot;*VALUES*&quot;.column2, &quot;*VALUES*&quot;.column1
                                      -&gt;  Values Scan on &quot;*VALUES*&quot;  (cost=0.00..0.11 rows=9 width=12)
</pre>
<p>(this is <strong>PostgreSQL</strong>).</p>
<p>The idea behind this solution is that the overall order of <code>id</code> is retained within each of the series but is broken whenever the series break. Thus, a difference between the overall row number and the <code>source</code>-wise row number is an invariant within each series and is guaranteed to change as the series break. Hence, we can <code>GROUP BY</code> on it (along with the <code>source</code>).</p>
<p>This workaround is nice and elegant, however it requires three sorts and one hash aggregate (apart from the support for window functions of course).</p>
<h3>SERIES function</h3>
<p>My proposal is to implement a certain extension to the <code>GROUP BY</code> clause, namely:</p>
<p><code>GROUP BY SERIES (series_expression) OVER ([PARTITION BY partitioning_expression] ORDER BY ordering_expression)</code></p>
<p>The logic is quite simple: it would find continuous blocks of <code>series_expression</code> as if the records were ordered by <code>series_ordering_expression</code> and assign them to the same group. Within this group, all standard aggregation functions could be supported.</p>
<p>The expression:</p>
<p><code>SERIES (series_expression) OVER ([PARTITION BY partitioning_expression] ORDER BY ordering_expression)</code></p>
<p>could also serve as a normal window function in the systems that support it. In this case, it would return the ordinal number of the series within each partition.</p>
<h3>Implementation</h3>
<p>Its implementation would be quite simple: the records sharing the same value of <code>partitioning_expression</code> should be ordered on <code>ordering_expression</code>, a zero-based (or one-based) counter a variable holding the previous value of the <code>series_expression</code> should be set up. Whenever the value of the <code>series_expression</code> changes, the counter is incremented (and returned as an output of <code>SERIES</code>) and the variable gets the new value of <code>series_expression</code>.</p>
<p>Using the session variables, this can be easily emulated in <strong>MySQL</strong>:</p>
<pre class="brush: sql">
SELECT  source,
        MIN(value) AS min,
        MAX(value) AS max,
        SUM(value) AS sum,
        CAST(AVG(value) AS DECIMAL(20, 2)) AS avg
FROM    (
        SELECT  @series := @series + (COALESCE(@source &lt;&gt; source, 0)) AS series,
                @source := source AS newsource,
                q.*
        FROM    (
                SELECT  @series := 1, @source := NULL
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  1 AS id, 1 AS source, 10 AS value
                UNION ALL
                SELECT  2 AS id, 1 AS source, 20 AS value
                UNION ALL
                SELECT  3 AS id, 2 AS source, 15 AS value
                UNION ALL
                SELECT  4 AS id, 2 AS source, 25 AS value
                UNION ALL
                SELECT  5 AS id, 1 AS source, 45 AS value
                UNION ALL
                SELECT  6 AS id, 3 AS source, 50 AS value
                UNION ALL
                SELECT  7 AS id, 3 AS source, 35 AS value
                UNION ALL
                SELECT  8 AS id, 1 AS source, 40 AS value
                UNION ALL
                SELECT  9 AS id, 1 AS source, 10 AS value
                ) q
        ORDER BY
                id
        ) q
GROUP BY
        series
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">10</td>
<td class="bigint">20</td>
<td class="decimal">30</td>
<td class="decimal">15.00</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="bigint">15</td>
<td class="bigint">25</td>
<td class="decimal">40</td>
<td class="decimal">20.00</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">45</td>
<td class="bigint">45</td>
<td class="decimal">45</td>
<td class="decimal">45.00</td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="bigint">35</td>
<td class="bigint">50</td>
<td class="decimal">85</td>
<td class="decimal">42.50</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">10</td>
<td class="bigint">40</td>
<td class="decimal">50</td>
<td class="decimal">25.00</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0004s (0.0026s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">9</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">&lt;derived3&gt;</td>
<td class="varchar">system</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">&lt;derived4&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">9</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">4</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">5</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">6</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">7</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">8</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">9</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">10</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">11</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">12</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union4,5,6,7,8,9,10,11,12&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
</table>
</div>
<pre>
select `q`.`source` AS `source`,min(`q`.`value`) AS `min`,max(`q`.`value`) AS `max`,sum(`q`.`value`) AS `sum`,cast(avg(`q`.`value`) as decimal(20,2)) AS `avg` from (select (@series:=((@series) + coalesce(((@source) &lt;&gt; `q`.`source`),0))) AS `series`,(@source:=`q`.`source`) AS `newsource`,`q`.`id` AS `id`,`q`.`source` AS `source`,`q`.`value` AS `value` from (select (@series:=1) AS `@series := 1`,(@source:=NULL) AS `@source := NULL`) `vars` straight_join (select 1 AS `id`,1 AS `source`,10 AS `value` union all select 2 AS `id`,1 AS `source`,20 AS `value` union all select 3 AS `id`,2 AS `source`,15 AS `value` union all select 4 AS `id`,2 AS `source`,25 AS `value` union all select 5 AS `id`,1 AS `source`,45 AS `value` union all select 6 AS `id`,3 AS `source`,50 AS `value` union all select 7 AS `id`,3 AS `source`,35 AS `value` union all select 8 AS `id`,1 AS `source`,40 AS `value` union all select 9 AS `id`,1 AS `source`,10 AS `value`) `q` order by `q`.`id`) `q` group by `q`.`series`
</pre>
<p>, with only one filesort (which could be avoided if the values were ordered with an index).</p>
<h3>Grouping on expressions sharing the order</h3>
<p>This clause would also allow running certain grouping queries more efficiently.</p>
<p>Say, we need to build a yearly report for sales on a web site. Let&#8217;s create a sample table:</p>
<p><a href="#" onclick="xcollapse('X4287');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X4287" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE sales (
        id INT NOT NULL,
        dt DATETIME NOT NULL,
        amount DECIMAL(20, 2) NOT NULL
) ENGINE=InnoDB;

CREATE INDEX
        ix_sales_dt_amount
ON      sales (dt, amount);

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(500000);
COMMIT;

INSERT
INTO    sales
SELECT  id,
        CAST(&#039;2011-02-18&#039; AS DATE) - INTERVAL id * 10 MINUTE,
        (10000 + CEILING(RAND(20110218) * 10000)) / 100
FROM    filler;
</pre>
</div>
<p>Normally, we do it with this query:</p>
<pre class="brush: sql">
SELECT  YEAR(dt), SUM(amount)
FROM    sales
WHERE   dt &gt;= &#039;2005-01-01&#039;
GROUP BY
        YEAR(dt)
ORDER BY
        NULL
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>YEAR(dt)</th>
<th>SUM(amount)</th>
</tr>
<tr>
<td class="integer">2005</td>
<td class="decimal">7885399.58</td>
</tr>
<tr>
<td class="integer">2006</td>
<td class="decimal">7888231.80</td>
</tr>
<tr>
<td class="integer">2007</td>
<td class="decimal">7881056.65</td>
</tr>
<tr>
<td class="integer">2008</td>
<td class="decimal">7894880.23</td>
</tr>
<tr>
<td class="integer">2009</td>
<td class="decimal">7879728.51</td>
</tr>
<tr>
<td class="integer">2010</td>
<td class="decimal">7884989.40</td>
</tr>
<tr>
<td class="integer">2011</td>
<td class="decimal">1034590.31</td>
</tr>
<tr class="statusbar">
<td colspan="100">7 rows fetched in 0.0002s (0.7834s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">sales</td>
<td class="varchar">range</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">8</td>
<td class="varchar"></td>
<td class="bigint">250224</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using index; Using temporary</td>
</tr>
</table>
</div>
<pre>
select year(`20110218_series`.`sales`.`dt`) AS `YEAR(dt)`,sum(`20110218_series`.`sales`.`amount`) AS `SUM(amount)` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;) group by year(`20110218_series`.`sales`.`dt`) order by NULL
</pre>
<p>The query is pretty fast, but we see a <code>using temporary</code> in the plan. Since the order of of <code>dt</code> is always that of <code>YEAR(dt)</code>, and the records are selected from the index on <code>dt</code> (which naturally come ordered), the temporary table is redundant. A sort grouping can be used instead.</p>
<p>We have no means to tell <strong>MySQL</strong> (or any other database system) to tell that the orders match, but with the extension proposed, we could do just that:</p>
<pre class="brush: sql">
SELECT  YEAR(MIN(dt)), SUM(amount)
FROM    sales
WHERE   dt &gt;= &#039;2005-01-01&#039;
GROUP BY
        SERIES(YEAR(dt)) OVER (ORDER BY dt)
</pre>
<p>Since <code>YEAR(dt)</code> will always form continuous series being ordered by <code>dt</code>, this is the same as the query above, but <code>filesort</code> could be avoided.</p>
<p>Let&#8217;s emulate this using session variables:</p>
<pre class="brush: sql">
SELECT  CAST(year AS UNSIGNED), CAST(sum_amount AS DECIMAL(20, 2))
FROM    (
        SELECT  @sum := 0,
                @year := YEAR(MIN(dt))
        FROM    sales
        WHERE   dt &gt;= &#039;2005-01-01&#039;
        ) vars
STRAIGHT_JOIN
        (
        SELECT  @sum AS sum_amount,
                @sum := CASE WHEN @year = YEAR(dt) THEN @sum + amount ELSE amount END,
                @year = YEAR(dt) AS series,
                @year AS year,
                @year := YEAR(dt)
        FROM    sales
        WHERE   dt &gt;= &#039;2005-01-01&#039;
        ORDER BY
                dt
        ) q
WHERE   series = 0
UNION ALL
SELECT  CAST(@year AS UNSIGNED), CAST(@sum AS DECIMAL(20, 2))
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>CAST(year AS UNSIGNED)</th>
<th>CAST(sum_amount AS DECIMAL(20, 2))</th>
</tr>
<tr>
<td class="bigint">2005</td>
<td class="decimal">7885399.58</td>
</tr>
<tr>
<td class="bigint">2006</td>
<td class="decimal">7888231.80</td>
</tr>
<tr>
<td class="bigint">2007</td>
<td class="decimal">7881056.65</td>
</tr>
<tr>
<td class="bigint">2008</td>
<td class="decimal">7894880.23</td>
</tr>
<tr>
<td class="bigint">2009</td>
<td class="decimal">7879728.51</td>
</tr>
<tr>
<td class="bigint">2010</td>
<td class="decimal">7884989.40</td>
</tr>
<tr>
<td class="bigint">2011</td>
<td class="decimal">1034590.31</td>
</tr>
<tr class="statusbar">
<td colspan="100">7 rows fetched in 0.0002s (0.5375s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">system</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived3&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">322416</td>
<td class="double">100.00</td>
<td class="varchar">Using where</td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="varchar">DERIVED</td>
<td class="varchar">sales</td>
<td class="varchar">range</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">8</td>
<td class="varchar"></td>
<td class="bigint">250224</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using index</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">Select tables optimized away</td>
</tr>
<tr>
<td class="bigint">4</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union1,4&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select cast(`q`.`year` as unsigned) AS `CAST(year AS UNSIGNED)`,cast(`q`.`sum_amount` as decimal(20,2)) AS `CAST(sum_amount AS DECIMAL(20, 2))` from (select (@sum:=0) AS `@sum := 0`,(@year:=year(min(`20110218_series`.`sales`.`dt`))) AS `@year := YEAR(MIN(dt))` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;)) `vars` straight_join (select (@sum) AS `sum_amount`,(@sum:=(case when ((@year) = year(`20110218_series`.`sales`.`dt`)) then ((@sum) + `20110218_series`.`sales`.`amount`) else `20110218_series`.`sales`.`amount` end)) AS `@sum := CASE WHEN @year = YEAR(dt) THEN @sum + amount ELSE amount END`,((@year) = year(`20110218_series`.`sales`.`dt`)) AS `series`,(@year) AS `year`,(@year:=year(`20110218_series`.`sales`.`dt`)) AS `@year := YEAR(dt)` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;) order by `20110218_series`.`sales`.`dt`) `q` where (`q`.`series` = 0) union all select cast((@year) as unsigned) AS `CAST(@year AS UNSIGNED)`,cast((@sum) as decimal(20,2)) AS `CAST(@sum AS DECIMAL(20, 2))`
</pre>
<p>We see that though inline view support is pretty inefficient in <strong>MySQL</strong> and the resulting query is not as fast as it should be, there is no <code>using temporary</code> and the query is more efficient. With a native support for <code>SERIES()</code>, it would be yet more fast.</p>
<h3>Conclusion</h3>
<p>A window function (or an extension for <code>GROUP BY</code> clause) that would allow grouping continuous series of an expression in an ordered dataset would ease writing queries that need to aggregate such series, make them more efficient and even improve certain kinds of grouping queries where the grouping expression shares its order with one of the indexes.</p>
<h3>Update of Feb 21 2011</h3>
<p>Another nice solution proposed by <strong>@delostilos</strong>:</p>
<pre class="brush: sql">
SELECT  MIN(source), series, SUM(value)
FROM    (
        SELECT  *,
                SUM(COALESCE((source &lt;&gt; ns)::INTEGER, 0)) OVER (ORDER BY id) AS series
        FROM    (
                SELECT  series.*,
                        LAG(source) OVER (ORDER BY id) AS ns
                FROM    (
                        VALUES
                        (1, 1, 10),
                        (2, 1, 20),
                        (3, 2, 15),
                        (4, 2, 25),
                        (5, 1, 45),
                        (6, 3, 50),
                        (7, 3, 35),
                        (8, 1, 40),
                        (9, 1, 10)
                        ) series (id, source, value)
                ) q
        ) q
GROUP BY
        series
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>min</th>
<th>series</th>
<th>sum</th>
</tr>
<tr>
<td class="int4">2</td>
<td class="int8">1</td>
<td class="int8">40</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">2</td>
<td class="int8">45</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int8">3</td>
<td class="int8">85</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">4</td>
<td class="int8">50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">0</td>
<td class="int8">30</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0005s (0.0030s)</td>
</tr>
</table>
</div>
<pre>
HashAggregate  (cost=0.84..0.98 rows=9 width=16)
  -&gt;  WindowAgg  (cost=0.26..0.68 rows=9 width=16)
        -&gt;  WindowAgg  (cost=0.26..0.41 rows=9 width=12)
              -&gt;  Sort  (cost=0.26..0.28 rows=9 width=12)
                    Sort Key: &quot;*VALUES*&quot;.column1
                    -&gt;  Values Scan on &quot;*VALUES*&quot;  (cost=0.00..0.11 rows=9 width=12)
</pre>
<p>This is more efficient than double <code>ROW_NUMBER</code>, since the engine only needs to sort on the <code>id</code> expression at most once (or does not need to sort at all if it comes sorted from the previous recordset or from the index).</p>
<p>However, the final <code>GROUP BY</code> still cannot use sort aggregation without additional sorting, since the output of the analytic <code>SUM()</code> cannot be assumed to have the same order as <code>id</code>. That&#8217;s why the native support from the system would still be an improvement.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2011/02/18/things-sql-needs-series/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/02/18/things-sql-needs-series/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Happy New Year!</title>
		<link>http://explainextended.com/2010/12/31/happy-new-year-2/</link>
		<comments>http://explainextended.com/2010/12/31/happy-new-year-2/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 20:00:17 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5177</guid>
		<description><![CDATA[A New Year clock in Oracle. Happy New Year!]]></description>
			<content:encoded><![CDATA[<p>Some say <strong>SQL</strong> is not good at graphics.</p>
<p>Well, they have some point. Database engines lack scanner drivers, there is no easy way to do sepia, and magic wand, let&#8217;s be honest, is just poor.</p>
<p>However, you can make some New Year paintings with <strong>SQL</strong>.</p>
<p>Let&#8217;s make a New Year clock showing 12 o&#8217;clock in <strong>Oracle</strong>.</p>
<p><span id="more-5177"></span></p>
<h3>#1. Circle</h3>
<p>First, we need a circle shape. In <strong>Photoshop</strong>, you could just select it from the toolbar, but in <strong>SQL</strong>, we need to use some math.</p>
<p>In math, a circle is defined by this formula: <code>x<sup>2</sup> + y<sup>2</sup> = R<sup>2</sup></code>.</p>
<p><code>x<sup>2</sup> + y<sup>2</sup></code> is, as we know, the square of a distance from the center of the coordinate grid, and <code>R</code> is the radius. This means <q>all points at distance <code>R</code> from the center</q>, which of course is a circle shape.</p>
<p><strong>SQL</strong> outputs data in tabular format. We will draw out art string by string, and for each line we need to know where to put the <q>paint</q> (just some <strong>ASCII</strong> symbols). Since it&#8217;s a circle, every line intersects it at most twice, and we shall put the symbols at the places where the circle and the line intersect. To know how many symbols are there from the beginning of the string, this formula is used:</p>
<p><code>ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle</code></p>
<p>To make our circle more circle-shaped, we need to choose the characters that correspond to the angle of the circle line in each given place. These characters are the most similar to the lines: <code>=/|\</code></p>
<p>Different characters correspond to different angles. To figure out the angle from the line number, we should use <code>ACOS</code>:</p>
<p><code>ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign</code></p>
<p>This would split the circle into several sectors, and for each of these sectors a corresponding character will be chosen.</p>
<p>Finally, we need to avoid the gaps in the lines. To do this, we will not only fill the intersections, but all space from the previous (or next) intersection to the current. To calculated the width of the string on each line that will be filled with the characters, we will use <code>LAG</code> and <code>LEAD</code>, the analytic functions.</p>
<p>Now, let&#8217;s put it all together:</p>
<pre class="brush: sql">
WITH    circle AS
        (
        SELECT  1 AS layer,
                line,
                LPAD(RPAD(SUBSTR(&#039;=/|\=&#039;, sign, 1), width, SUBSTR(&#039;=/|\=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) ||
                RPAD(&#039; &#039;, (angle - width) * 2, &#039; &#039;) ||
                RPAD(RPAD(SUBSTR(&#039;=\|/=&#039;, sign, 1), width, SUBSTR(&#039;=\|/=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) AS drawing
        FROM    (
                SELECT  line,
                        angle,
                        ABS
                        (
                        angle -
                        CASE
                        WHEN line &lt; 21 THEN
                                COALESCE(LAG(angle) OVER (ORDER BY line), 0)
                        WHEN line &gt; 21 THEN
                                COALESCE(LEAD(angle) OVER (ORDER BY line), 0)
                        ELSE
                                angle
                        END
                        ) + 1 AS width,
                        ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign
                FROM    (
                        SELECT  level AS line,
                                ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 41
                        ) q
                ) q
        )
SELECT  *
FROM    circle
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="varchar2">                    ==                    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">3</td>
<td class="varchar2">           ////          \\\\           </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">4</td>
<td class="varchar2">         ///                \\\         </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">5</td>
<td class="varchar2">        //                    \\        </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">6</td>
<td class="varchar2">       //                      \\       </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">7</td>
<td class="varchar2">      //                        \\      </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">8</td>
<td class="varchar2">     //                          \\     </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">9</td>
<td class="varchar2">    //                            \\    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">10</td>
<td class="varchar2">   //                              \\   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">11</td>
<td class="varchar2">   /                                \   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">12</td>
<td class="varchar2">  //                                \\  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">13</td>
<td class="varchar2">  /                                  \  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">14</td>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">15</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">16</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">17</td>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">18</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">19</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">20</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">21</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">22</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">23</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">24</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">25</td>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">26</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">27</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">28</td>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">29</td>
<td class="varchar2">  \                                  /  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">30</td>
<td class="varchar2">  \\                                //  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">31</td>
<td class="varchar2">   \                                /   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">32</td>
<td class="varchar2">   \\                              //   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">33</td>
<td class="varchar2">    \\                            //    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">34</td>
<td class="varchar2">     \\                          //     </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">35</td>
<td class="varchar2">      \\                        //      </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">36</td>
<td class="varchar2">       \\                      //       </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">37</td>
<td class="varchar2">        \\                    //        </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">38</td>
<td class="varchar2">         \\\                ///         </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">39</td>
<td class="varchar2">           \\\\          ////           </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">40</td>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">41</td>
<td class="varchar2">                    ==                    </td>
</tr>
</table>
</div>
<h3>#2. Dial</h3>
<p>Now, we shall draw the clock dial on a separate layer (yes, you can draw on layers in <strong>SQL</strong>).</p>
<p>We will use Roman numerals for the dial. <strong>Oracle</strong> has an internal function to format the numbers as Roman numerals, but <strong>SQL</strong> graphics is the first time I&#8217;m using this feature in practice.</p>
<p>The principle is the same: we should calculate the angle of each number, then the line and the column that intersect the dial at the given angle and put the number there.</p>
<p>We will only output the lines containing actual numbers. Each line will contain one or two numbers (<strong>1</strong> and <strong>6</strong> go on their own lines, the others go in pairs). We can use <code>MIN</code> and <code>MAX</code> to distinguish them.</p>
<p>Here&#8217;s what the lines of our dial will look like:</p>
<pre class="brush: sql">
WITH    dial AS
        (
        SELECT  2,
                line,
                RPAD(&#039; &#039;, 20 - angle, &#039; &#039;) ||
                RPAD(rnf, angle * 2 - LENGTH(rns)) ||
                RPAD(rns, 20 - angle + DECODE(rnf, NULL, 0, LENGTH(rns)) + 1)
        FROM    (
                SELECT  line, angle,
                        DECODE(MAX(h), MIN(h), NULL, TRIM(TO_CHAR(MAX(h), &#039;RN&#039;))) AS rnf,
                        TRIM(TO_CHAR(MIN(h), &#039;RN&#039;)) AS rns
                FROM    (
                        SELECT  level + 2 AS line,
                                ROUND(SQRT(1 - POWER((level - 19) / 18, 2)) * 18) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 37
                        ) lines
                JOIN    (
                        SELECT  level AS h, ROUND(-COS(3.141592 * level / 6) * 18) + 21 AS hline
                        FROM    dual
                        CONNECT BY
                                level &lt;= 12
                        ) hours
                ON      hline = line
                GROUP BY
                        line, angle
                ) q
        )
SELECT  *
FROM    dial
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>2</th>
<th>LINE</th>
<th>RPAD(&#8221;,20-ANGLE,&#8221;)||RPAD(RNF,ANGLE*2-LENGTH(RNS))||RPAD(RNS,20-ANGLE+DECODE(RNF,NULL,0,LENGTH(RNS))+1)</th>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">3</td>
<td class="varchar2">                    XII                  </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">5</td>
<td class="varchar2">            XI             I             </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">12</td>
<td class="varchar2">    X                             II     </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">21</td>
<td class="varchar2">  IX                               III   </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">30</td>
<td class="varchar2">    VIII                          IV     </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">37</td>
<td class="varchar2">            VII            V             </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">39</td>
<td class="varchar2">                    VI                   </td>
</tr>
</table>
</div>
<p>Note that the shape is distorted: this is because of the gaps in the line numbers. It will be taken care of later.</p>
<h3>#3. Hands and the center pin</h3>
<p>Hands and the center pin are simple: we will use <code>|||</code> for the hour hand, <code>|</code> for the minute hand and <code>O</code> for the pin. On a New Year clock the hands, fortunately, are vertical and directed in one way.</p>
<p>Here&#8217;s the hour hand:</p>
<pre class="brush: sql">
WITH    hourhand AS
        (
        SELECT  3 AS layer,
                level + 10 AS line,
                RPAD(&#039; &#039;, 20, &#039; &#039;) || &#039;|||&#039; || RPAD(&#039; &#039;, 18, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 10
        )
SELECT  *
FROM    hourhand
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">11</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">12</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">13</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">14</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">15</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">16</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">17</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">18</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">19</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">20</td>
<td class="varchar2">                    |||                  </td>
</tr>
</table>
</div>
<p>, the minute hand:</p>
<pre class="brush: sql">
WITH    minutehand AS
        (
        SELECT  4 AS layer,
                level + 4 AS line,
                RPAD(&#039; &#039;, 21, &#039; &#039;) || &#039;|&#039; || RPAD(&#039; &#039;, 19, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 16
        )
SELECT  *
FROM    minutehand
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">5</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">6</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">7</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">8</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">9</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">10</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">11</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">12</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">13</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">14</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">15</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">16</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">17</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">18</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">19</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">20</td>
<td class="varchar2">                     |                   </td>
</tr>
</table>
</div>
<p>and the pin:</p>
<pre class="brush: sql">
WITH    pin AS
        (
        SELECT  5 AS layer,
                21 AS line,
                RPAD(LPAD(&#039;O&#039;, 22, &#039; &#039;), 41, &#039; &#039;) AS drawing
        FROM    dual
        )
SELECT  *
FROM    pin
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">21</td>
<td class="varchar2">                     O                   </td>
</tr>
</table>
</div>
<h3>#4. Merging the layers.</h3>
<p>Now we need to merge the layers.</p>
<p>The last layers should go on the foreground, the first ones on the background. Space symbol should be <q>transparent</q>: if there is a character on the background layer, it should be visible through it.</p>
<p>To do it we we apply this trick:</p>
<ol>
<li>Output all rows from all layers in a single query, using <code>UNION ALL</code></li>
<li>Split each row into the individual characters, one character per record</li>
<li>
<p>Using <code>ROW_NUMBER()</code>, the analytical function, order the characters within each line and columns so that visible characters from higher layers go first, and spaces and characters from the lower layers go last:</p>
<p><code>ROW_NUMBER() OVER (PARTITION BY line, col ORDER BY DECODE(cc, ' ', 1, 0), layer DESC) AS rn</code></p>
</li>
<li>Filter the characters on <code>rn</code> so that only the topmost character from the each line and column is returned</li>
<li>Concatenate all characters line-wise, using a recursive query and <code>SYS_CONNECT_BY_PATH</code>. This function requires a separator, and we will use a question mark, <code>?</code>, for this purpose, since it&#8217;s not used in our art.</li>
<li>Remove the separator from the concatenated strings, using <code>REPLACE</code></li>
<li>Output the result</li>
</ol>
<h3>#5. Result</h3>
<p>Here&#8217;s our query and drawing:</p>
<pre class="brush: sql">
WITH    circle AS
        (
        SELECT  1 AS layer,
                line,
                LPAD(RPAD(SUBSTR(&#039;=/|\=&#039;, sign, 1), width, SUBSTR(&#039;=/|\=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) ||
                RPAD(&#039; &#039;, (angle - width) * 2, &#039; &#039;) ||
                RPAD(RPAD(SUBSTR(&#039;=\|/=&#039;, sign, 1), width, SUBSTR(&#039;=\|/=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) AS drawing
        FROM    (
                SELECT  line,
                        angle,
                        ABS
                        (
                        angle -
                        CASE
                        WHEN line &lt; 21 THEN
                                COALESCE(LAG(angle) OVER (ORDER BY line), 0)
                        WHEN line &gt; 21 THEN
                                COALESCE(LEAD(angle) OVER (ORDER BY line), 0)
                        ELSE
                                angle
                        END
                        ) + 1 AS width,
                        ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign
                FROM    (
                        SELECT  level AS line,
                                ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 41
                        ) q
                ) q
        ),
        dial AS
        (
        SELECT  2,
                line,
                RPAD(&#039; &#039;, 20 - angle, &#039; &#039;) ||
                RPAD(rnf, angle * 2 - LENGTH(rns)) ||
                RPAD(rns, 20 - angle + DECODE(rnf, NULL, 0, LENGTH(rns)) + 1)
        FROM    (
                SELECT  line, angle,
                        DECODE(MAX(h), MIN(h), NULL, TRIM(TO_CHAR(MAX(h), &#039;RN&#039;))) AS rnf,
                        TRIM(TO_CHAR(MIN(h), &#039;RN&#039;)) AS rns
                FROM    (
                        SELECT  level + 2 AS line,
                                ROUND(SQRT(1 - POWER((level - 19) / 18, 2)) * 18) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 37
                        ) lines
                JOIN    (
                        SELECT  level AS h, ROUND(-COS(3.141592 * level / 6) * 18) + 21 AS hline
                        FROM    dual
                        CONNECT BY
                                level &lt;= 12
                        ) hours
                ON      hline = line
                GROUP BY
                        line, angle
                ) q
        ),
        hourhand AS
        (
        SELECT  3 AS layer,
                level + 10 AS line,
                RPAD(&#039; &#039;, 20, &#039; &#039;) || &#039;|||&#039; || RPAD(&#039; &#039;, 18, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 10
        ),
        minutehand AS
        (
        SELECT  4 AS layer,
                level + 4 AS line,
                RPAD(&#039; &#039;, 21, &#039; &#039;) || &#039;|&#039; || RPAD(&#039; &#039;, 19, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 16
        ),
        pin AS
        (
        SELECT  5 AS layer,
                21 AS line,
                RPAD(LPAD(&#039;O&#039;, 22, &#039; &#039;), 41, &#039; &#039;) AS drawing
        FROM    dual
        ),
        m AS
        (
        SELECT  line, col, cc
        FROM    (
                SELECT  line, col, cc,
                        ROW_NUMBER() OVER (PARTITION BY line, col ORDER BY DECODE(cc, &#039; &#039;, 1, 0), layer DESC) AS rn
                FROM    (
                        SELECT  cols.*, layers.*, SUBSTR(drawing, col, 1) AS cc
                        FROM    (
                                SELECT  level AS col
                                FROM    dual
                                CONNECT BY
                                        level &lt;= 50
                                ) cols
                        CROSS JOIN
                                (
                                SELECT  *
                                FROM    circle
                                UNION ALL
                                SELECT  *
                                FROM    dial
                                UNION ALL
                                SELECT  *
                                FROM    hourhand
                                UNION ALL
                                SELECT  *
                                FROM    minutehand
                                UNION ALL
                                SELECT  *
                                FROM    pin
                                ) layers
                        )
                )
        WHERE   rn = 1
        )
SELECT  REPLACE(drawing, &#039;?&#039;) AS drawing
FROM    (
        SELECT  SYS_CONNECT_BY_PATH(cc, &#039;?&#039;) AS drawing, line, CONNECT_BY_ISLEAF AS leaf
        FROM    m mi
        START WITH
                mi.col = 1
        CONNECT BY
                mi.line = PRIOR mi.line
                AND mi.col = PRIOR mi.col + 1
        )
WHERE   leaf = 1
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>DRAWING</th>
</tr>
<tr>
<td class="varchar2">                    ==                    </td>
</tr>
<tr>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="varchar2">           ////     XII  \\\\           </td>
</tr>
<tr>
<td class="varchar2">         ///                \\\         </td>
</tr>
<tr>
<td class="varchar2">        //  XI       |     I  \\        </td>
</tr>
<tr>
<td class="varchar2">       //            |         \\       </td>
</tr>
<tr>
<td class="varchar2">      //             |          \\      </td>
</tr>
<tr>
<td class="varchar2">     //              |           \\     </td>
</tr>
<tr>
<td class="varchar2">    //               |            \\    </td>
</tr>
<tr>
<td class="varchar2">   //                |             \\   </td>
</tr>
<tr>
<td class="varchar2">   /                |||             \   </td>
</tr>
<tr>
<td class="varchar2">  //X               |||           II\\  </td>
</tr>
<tr>
<td class="varchar2">  /                 |||              \  </td>
</tr>
<tr>
<td class="varchar2"> ||                 |||              || </td>
</tr>
<tr>
<td class="varchar2"> |                  |||               | </td>
</tr>
<tr>
<td class="varchar2"> |                  |||               | </td>
</tr>
<tr>
<td class="varchar2">||                  |||               ||</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">| IX                 O             III |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="varchar2">  \                                  /  </td>
</tr>
<tr>
<td class="varchar2">  \\VIII                          IV//  </td>
</tr>
<tr>
<td class="varchar2">   \                                /   </td>
</tr>
<tr>
<td class="varchar2">   \\                              //   </td>
</tr>
<tr>
<td class="varchar2">    \\                            //    </td>
</tr>
<tr>
<td class="varchar2">     \\                          //     </td>
</tr>
<tr>
<td class="varchar2">      \\                        //      </td>
</tr>
<tr>
<td class="varchar2">       \\                      //       </td>
</tr>
<tr>
<td class="varchar2">        \\  VII            V  //        </td>
</tr>
<tr>
<td class="varchar2">         \\\                ///         </td>
</tr>
<tr>
<td class="varchar2">           \\\\     VI   ////           </td>
</tr>
<tr>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="varchar2">                    ==                    </td>
</tr>
</table>
</div>
<div class="plainnote" style="text-align: center">
<big><strong>Happy New Year!</strong></big>
</div>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2010/12/31/happy-new-year-2/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/12/31/happy-new-year-2/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Things SQL needs: determining range cardinality</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/</link>
		<comments>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/#comments</comments>
		<pubDate>Wed, 19 May 2010 19:00:51 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=4624</guid>
		<description><![CDATA[What is the problem with this query? SELECT * FROM orders WHERE quantity &#60;= 4 AND urgency &#60;= 4 The problem is indexing strategy, of course. Which columns should we index? If we index quantity, the optimizer will be able to use the index to filter on it. However, filtering on urgency will require scanning [...]]]></description>
			<content:encoded><![CDATA[<p>What is the problem with this query?</p>
<pre class="brush: sql">
SELECT  *
FROM    orders
WHERE   quantity &lt;= 4
        AND urgency &lt;= 4
</pre>
<p><!-- --><br />
The problem is indexing strategy, of course. Which columns should we index?</p>
<p>If we index <code>quantity</code>, the optimizer will be able to use the index to filter on it. However, filtering on <code>urgency</code> will require scanning all records with <code>quantity &lt; 4</code> and applying the <code>urgency</code> filter to each record found.</p>
<p>Same with <code>urgency</code>. We can use range access on <code>urgency</code> using an index, but this will require filtering on <code>quantity</code>.</p>
<p><q>Why, create a composite index!</q>, some will say.</p>
<p>Unfortunately, that won&#8217;t help much.</p>
<p>A composite <strong>B-Tree</strong> index maintains what is called a <a href="http://en.wikipedia.org/wiki/Lexicographical_order">lexicographical order</a> of the records. This means that an index on <code>(quantity, urgency)</code> will sort on <code>quantity</code>, and only if the quantities are equal, it will take the <code>urgency</code> into account.</p>
<p>The picture below shows how would the records be ordered in such an index:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/05/top.png" alt="" title="Top" width="650" height="450" class="aligncenter size-full wp-image-4752 noborder" /></p>
<p>As we can see, with a single index range scan (i. e. just following the arrows) we cannot select only the records within the dashed rectangle. There is no single index range that could be used to filter on both columns.</p>
<p>Even if we changed the field order in the index, it would just change the direction of the arrows connecting the records:<br />
<span id="more-4624"></span><br />
<img src="http://explainextended.com/wp-content/uploads/2010/05/left.png" alt="" title="Left" width="650" height="450" class="aligncenter size-full wp-image-4754 noborder" /></p>
<p>, and still no single range that contains only the records we need.</p>
<p>Can we improve it?</p>
<p>If we take at closer look at the table contents we will see that despite the fact that there is no single range that contains the records we need (and only them), there are four ranges that do filter our records.</p>
<p>If we rewrote the query condition like that:</p>
<pre class="brush: sql">
SELECT  *
FROM    orders
WHERE   quantity IN (0, 1, 2, 3, 4)
        AND urgency &lt;= 4
</pre>
<p>with the first index, or like that:</p>
<pre class="brush: sql">
SELECT  *
FROM    orders
WHERE   quantity &lt;= 4
        AND urgency IN (0, 1, 2, 3, 4)
</pre>
<p>with the second index, then any decent <strong>SQL</strong> engine would build a decent and efficient plan (<code>Index range</code> in <strong>MySQL</strong>, <code>INLIST ITERATOR</code> in <strong>Oracle</strong> etc.)</p>
<p>The problem is that rewriting this query still requires an <strong>SQL</strong> developer. But this could be done automatically by the optimizer. There are several methods to do that.</p>
<h3>Smallest superset</h3>
<p>A range condition defines a set of the values that could possibly satisfy the condition (i. e. belong to the range). The <q>number</q> of the values that satisfy this condition is called set cardinality. The word number is quoted here because it can be infinite, and not all infinities are created equal: some are more infinite than the others!</p>
<p>However, if we take the column definition into account, we can see that some ranges define finite (and quite constrained) sets of possible values. For instance, a condition like <code>quantity &lt; 4</code> on a column that is defined as <code>UNSIGNED INT</code> can <em>possibly</em> be satisfied by five values: <strong>0</strong>, <strong>1</strong>, <strong>2</strong>, <strong>3</strong> and <strong>4</strong>.</p>
<p>This set is the smallest superset of all sets of values that satisfy the range condition.</p>
<h3>Loose index scan</h3>
<p>Even with the conditions that theoretically define the infinite sets of values that <em>could</em> satisfy them, practically there is always a finite number of values in the table that <em>do</em> satisfy them (the table itself contains the finite number of records, to begin with).</p>
<p>And most engines keep track of that number in their statistics tables: this is what is called <em>field cardinality</em>, a measure of field uniqueness.</p>
<p>If the range cardinality is expected to be low (either from the set of values that can possibly belong to the range, or from the actual number of distinct values that do belong to the range, according to statistics), it would be a wise idea to rewrite the range condition as an <code>IN</code> condition containing all possible values that can belong or do belong to the range.</p>
<p>This will replace a single <q>less than</q> or <q>greater then</q> with a small number of <q>equals to</q>. And an <q>equals to</q> gives the optimizer much more space to, um, optimize. It can use a <code>HASH JOIN</code>, split the index into a number of continuous ranges or do some other interesting things that can only be done with an equijoin.</p>
<p>With an <code>UNSIGNED INTEGER</code> column, it is easy to generate a set of values that could satisfy the range. But what if we know the range cardinality to be low from the statistics, not from the column datatype?</p>
<p>In this case, we could build the set of possible values using what <strong>MySQL</strong> calls a <a href="http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html">loose index scan</a>.</p>
<p>Basically, it takes the first record from the index and then recursively searches the next lowest record whose key value is greater than the previous one, using the <code>index seek</code> (as opposed to <code>index scan</code>). This means instead of mere scanning the index and applying the condition to each field, the engine would walk up and down the <strong>B-Tree</strong> to locate the first record the the greater key. It is much more efficient when the number of distinct keys is small. And in fact, <strong>MySQL</strong> does use this method for queries that involve <code>SELECT DISTINCT</code> on an indexed field.</p>
<h3>MIN / MAX</h3>
<p>This method would be useful for the open ranges (like <code>&gt;</code> or <code>&lt;</code>, as opposed to <code>BETWEEN</code>).</p>
<p>By default, the <code>INT</code> column means <code>SIGNED INT</code>. For a condition like <code>quantity &lt;= 4</code> this would mean all integers from <strong>-2,147,483,647</strong> to <strong>4</strong> which is way too many.</p>
<p>In real tables, the quantities would be something greater than <strong>0</strong>. But not all developers bother to pick a right datatype or add <code>CHECK</code> constraints for their columns (and some databases like <strong>PostgreSQL</strong> lack the unsigned datatypes anyway).</p>
<p>To work around this, we could find the minimum existing value in the index using a single index seek. It would serve as a lower bound for the range. Since in real table that would most probably be something like <strong>0</strong> or <strong>1</strong>, that would make the range much more constrained.</p>
<p>All these three methods could be used at the same time. Since the methods require nothing but just a single lookup of the statistics table and a single index seek, the most efficient method to return the subset of values that could satisfy the range could be chosen at runtime.</p>
<h3>Implementation</h3>
<p>Now, let&#8217;s make a sample table in <strong>PostgreSQL</strong> and see how could we benefit from replacing the low-cardinality ranges with the lists of values:</p>
<p><a href="#" onclick="xcollapse('X5181');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X5181" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_composite (
        id INT NOT NULL,
        uint1 INT NOT NULL,
        uint2 INT NOT NULL,
        real1 DOUBLE PRECISION NOT NULL,
        real2 DOUBLE PRECISION NOT NULL,
        stuffing VARCHAR(200) NOT NULL
        );

SELECT SETSEED(0.20100518);

INSERT
INTO    t_composite
SELECT  n,
        CEILING(RANDOM() * 40),
        CEILING(RANDOM() * 400000),
        CEILING(RANDOM() * 40) * 0.01,
        CEILING(RANDOM() * 400000) * 0.01,
        RPAD(&#039;&#039;, 200, &#039;*&#039;)
FROM    generate_series(1, 16000000) n;

ALTER TABLE t_composite ADD CONSTRAINT pk_composite_id PRIMARY KEY (id);

CREATE INDEX ix_composite_uint ON t_composite (uint1, uint2);

CREATE INDEX ix_composite_real ON t_composite (real1, real2);
</pre>
</div>
<p>This table contains <strong>16,000,000</strong> records with two integer fields and two <code>double precision</code> fields.</p>
<p>There are composite indexes on the pairs of fields. This indexes are intentionally created with the least selective column leading to demonstrate the benefits of range transformation.</p>
<p>First, let&#8217;s run a query similar to the original one:</p>
<pre class="brush: sql">
SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 &lt;= 20
        AND uint2 &lt;= 20
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">75600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (5.1249s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=171331.57..171331.58 rows=1 width=204)
  -&gt;  Index Scan using ix_composite_uint on t_composite  (cost=0.00..171329.57 rows=796 width=204)
        Index Cond: ((uint1 &lt;= 20) AND (uint2 &lt;= 20))
</pre>
<p>The plan says that the index condition involves both fields. However, only the first field is used in the <strong>B-Tree</strong> search: the second is just being filtered on, though no actual table access is performed yet on this step.</p>
<p>The index keys are not too long, however, there are several millions of them that need to be scanned. That&#8217;s why the query takes more than 5 seconds to complete.</p>
<p>The I/O statistics show the following:</p>
<pre class="brush: sql">
SELECT  pg_stat_reset();

SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 &lt;= 20
        AND uint2 &lt;= 20;

SELECT  pg_stat_get_blocks_fetched(&#039;ix_composite_uint&#039;::regclass);
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>pg_stat_get_blocks_fetched</th>
</tr>
<tr>
<td class="int8">21865</td>
</tr>
</table>
</div>
<p>The query required more than twenty thousands of index blocks to be read and examined</p>
<h4>Smallest superset</h4>
<p>Now, let&#8217;s try to substitute the hard-coded list of possible values instead of the range condition:</p>
<pre class="brush: sql">
SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 IN
        (
        SELECT  generate_series(0, 20)
        )
        AND uint2 &lt;= 20
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">75600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0063s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=167.47..167.49 rows=1 width=204)
  -&gt;  Nested Loop  (cost=0.02..167.37 rows=40 width=204)
        -&gt;  HashAggregate  (cost=0.02..0.03 rows=1 width=4)
              -&gt;  Result  (cost=0.00..0.01 rows=1 width=0)
        -&gt;  Index Scan using ix_composite_uint on t_composite  (cost=0.00..166.84 rows=40 width=208)
              Index Cond: ((t_composite.uint1 = (generate_series(0, 20))) AND (t_composite.uint2 &lt;= 20))
</pre>
<p>Instead of a giant singe range, there are <strong>21</strong> short ranges examined in a nested loop. This is instant (<strong>6 ms</strong>).</p>
<p>Let&#8217;s look into the I/O statistics again:</p>
<pre class="brush: sql">
SELECT  pg_stat_reset();

SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 IN
        (
        SELECT  generate_series(0, 20)
        )
        AND uint2 &lt;= 20;

SELECT  pg_stat_get_blocks_fetched(&#039;ix_composite_uint&#039;::regclass);
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>pg_stat_get_blocks_fetched</th>
</tr>
<tr>
<td class="int8">64</td>
</tr>
</table>
</div>
<p>Now, only <strong>64</strong> blocks need to be read.</p>
<h4>MIN / MAX</h4>
<p>We took <strong>0</strong> as the initial value, but since theoretically there can be negative numbers in the columns, this assumption is not safe.</p>
<p>We need to get the least value from the table instead of assuming it:</p>
<pre class="brush: sql">
SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 IN
        (
        SELECT  generate_series(
                (
                SELECT  MIN(uint1)
                FROM    t_composite
                ), 20)
        )
        AND uint2 &lt;= 20
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">75600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0066s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=171.40..171.41 rows=1 width=204)
  -&gt;  Nested Loop  (cost=3.95..171.29 rows=40 width=204)
        -&gt;  HashAggregate  (cost=3.95..3.96 rows=1 width=4)
              -&gt;  Result  (cost=3.92..3.93 rows=1 width=0)
                    InitPlan 2 (returns $1)
                      -&gt;  Result  (cost=3.91..3.92 rows=1 width=0)
                            InitPlan 1 (returns $0)
                              -&gt;  Limit  (cost=0.00..3.91 rows=1 width=4)
                                    -&gt;  Index Scan using ix_composite_uint on t_composite  (cost=0.00..62571556.68 rows=15999664 width=4)
                                          Filter: (uint1 IS NOT NULL)
        -&gt;  Index Scan using ix_composite_uint on t_composite  (cost=0.00..166.84 rows=40 width=208)
              Index Cond: ((&quot;20100519_cardinality&quot;.t_composite.uint1 = (generate_series($1, 20))) AND (&quot;20100519_cardinality&quot;.t_composite.uint2 &lt;= 20))
</pre>
<p>This is instant again.</p>
<p>Now, what about statistics?</p>
<pre class="brush: sql">
SELECT  pg_stat_reset();

SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND uint1 IN
        (
        SELECT  generate_series(
                (
                SELECT  MIN(uint1)
                FROM    t_composite
                ), 20)
        )
        AND uint2 &lt;= 20;

SELECT  pg_stat_get_blocks_fetched(&#039;ix_composite_uint&#039;::regclass);
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>pg_stat_get_blocks_fetched</th>
</tr>
<tr>
<td class="int8">67</td>
</tr>
</table>
</div>
<p>The query is only <strong>3</strong> block reads heavier, but this time it is guaranteed to be correct.</p>
<h4>Loose index scan</h4>
<p>The <code>real*</code> columns hold the double precision data.</p>
<p>First, let&#8217;s run the original query:</p>
<pre class="brush: sql">
SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND real1 &lt;= 0.2
        AND real2 &lt;= 0.2
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">83600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (6.6561s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=206210.82..206210.84 rows=1 width=204)
  -&gt;  Index Scan using ix_composite_real on t_composite  (cost=0.00..206208.84 rows=793 width=204)
        Index Cond: ((real1 &lt;= 0.2::double precision) AND (real2 &lt;= 0.2::double precision))
</pre>
<p>Again, there are way too many block reads:</p>
<pre class="brush: sql">
SELECT  pg_stat_reset();

SELECT  SUM(LENGTH(stuffing))
FROM    t_composite
WHERE   1 = 1
        AND real1 &lt;= 0.2
        AND real2 &lt;= 0.2;

SELECT  pg_stat_get_blocks_fetched(&#039;ix_composite_real&#039;::regclass);
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>pg_stat_get_blocks_fetched</th>
</tr>
<tr>
<td class="int8">30657</td>
</tr>
</table>
</div>
<p>, and the query takes more than <strong>6 seconds</strong>.</p>
<p>For a condition like <code>real1 &lt;= 0.2</code>, the smallest superset of all possible values (that is all possible double-precision values between <strong>0</strong> and <strong>0.2</strong>) would be too large (though still finite of course) to be generated and joined. That&#8217;s why we need to use server-collected statistics to decide whether a loose index scan would be efficient to get the list of all distinct values of <code>real1</code> in the table:</p>
<pre class="brush: sql">
SELECT  n_distinct, most_common_vals, histogram_bounds
FROM    pg_stats
WHERE   schemaname = &#039;20100519_cardinality&#039;
        AND tablename = &#039;t_composite&#039;
        AND attname IN (&#039;real1&#039;, &#039;real2&#039;)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>n_distinct</th>
<th>most_common_vals</th>
<th>histogram_bounds</th>
</tr>
<tr>
<td class="float4">40</td>
<td class="anyarray">{0.09,0.22,0.01,0.37,0.04,0.25,0.28,0.34,0.27,0.06,0.36,0.11,0.08,0.39,0.12,0.2,0.02,0.16,0.17,0.21,0.29,0.18,0.19,0.26,0.3,0.32,0.35,0.1,0.13,0.4,0.15,0.23,0.38,0.03,0.33,0.24,0.07,0.14,0.05,0.31}</td>
<td class="anyarray"></td>
</tr>
<tr>
<td class="float4">362181</td>
<td class="anyarray">{1781.85,128.6,142.73,257.88,332.62,618.61,705.35,829.91,845.82,874.08,1432.82,1469.16,1486.01,1569.16,1866.43,2111.48,2234.48,2282.78,2340.7,2382.54,2468.19,2491.35,2494.73,2508.51,2587.51,2750.98,2876.07,2956.11,3222.62,3463.2,3564.41,3872.25,3.43,13.22,14.3,15.82,16.12,16.36,19.98,21.79,23.03,25.41,28.24,28.96,31.95,35.1,36.31,38.46,39.72,39.87,42.42,49.32,50.84,56.59,60.31,81.8,84.48,84.74,86.62,88.58,91.82,103.48,112.88,114.99,117.55,119.55,122.52,123.04,127.98,128.26,130.56,131.15,134.57,134.91,135.8,139.05,139.41,141.39,146.13,151.92,158.16,158.58,161.71,169.92,170.43,173.3,173.74,178.06,189.43,192.99,199.69,210.98,218.45,225.44,230.16,233.25,233.67,240.9,246.5,249.77}</td>
<td class="anyarray">{0.07,44.3,86.34,129.18,172.01,212.46,254.83,294.83,336.18,374.44,414.75,458.07,498.72,540.88,582.92,621.95,660.28,700.75,738.46,778.95,822.09,857.34,896.27,937.63,977.05,1017.49,1062.93,1104.88,1147.17,1185.99,1225.41,1267.8,1307.45,1347.49,1386.1,1424.05,1465.05,1503.23,1542.03,1581.45,1619.53,1658.57,1697.74,1741.65,1782.59,1823.19,1867.54,1905.59,1945.39,1986.04,2022.32,2061.52,2102.77,2143.07,2180.29,2218.06,2262.62,2302.28,2342.05,2382.29,2416.18,2455.43,2489.25,2528.07,2572.15,2606.91,2648.59,2685.43,2724.9,2765.49,2805.78,2845.94,2886.33,2925.52,2967.07,3005.12,3046.34,3084.44,3122.81,3158.25,3199.74,3242.4,3279.34,3319.41,3359.39,3401.15,3436.32,3477.28,3517.05,3559.6,3599.6,3639.63,3678.82,3720.27,3760.32,3801.89,3844.98,3883.96,3922.63,3963.01,3999.98}</td>
</tr>
<tr class="statusbar">
<td colspan="100">2 rows fetched in 0.0003s (0.0072s)</td>
</tr>
</table>
</div>
<p>From the server statistics we see that there are only <strong>40</strong> distinct values of <code>real1</code>. Hence, a loose index scan as such would be efficient.</p>
<p>Let&#8217;s look into the stats on <code>real2</code>. We see that there are <code>362181</code> distinct values in the table, and the range in question (<code>&lt;= 0.2</code>) corresponds to the first entry in the histogram (<code>BETWEEN 0.07 AND 44.3</code>). Since the histogram splits the values into <strong>100</strong> percentiles, this means that there are about <strong>3622</strong> values from <strong>0.07</strong> to <strong>44.3</strong>, and <code>((0.2 - 0.07) / (44.3 - 0.07) * 362181 / 100) ≈</code> <strong>11</strong> distinct values inside our range, and about <code>11 * 16000000 /362181 ≈</code> <strong>486</strong> records with these values.</p>
<p>Taking into account that the original query would need to scan <strong>8,000,000</strong> records, the loose index scan seems to be a good idea.</p>
<p>Unfortunately, <strong>PostgreSQL</strong> does not support it directly, but with minimal effort it can be emulated:</p>
<pre class="brush: sql">
WITH    RECURSIVE q (real1) AS
        (
        SELECT  MIN(real1)
        FROM    t_composite
        UNION ALL
        SELECT  (
                SELECT  c.real1
                FROM    t_composite c
                WHERE   c.real1 &gt; q.real1
                        AND c.real1 &lt;= 0.2
                ORDER BY
                        c.real1
                LIMIT 1
                )
        FROM    q
        WHERE   q.real1 IS NOT NULL
        )
SELECT  SUM(LENGTH(stuffing))
FROM    q
JOIN    t_composite c
ON      c.real1 = q.real1
        AND c.real2 &lt;= 0.2
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">83600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0082s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=17319.78..17319.80 rows=1 width=204)
  CTE q
    -&gt;  Recursive Union  (cost=3.92..401.33 rows=101 width=8)
          -&gt;  Result  (cost=3.92..3.93 rows=1 width=0)
                InitPlan 1 (returns $1)
                  -&gt;  Limit  (cost=0.00..3.92 rows=1 width=8)
                        -&gt;  Index Scan using ix_composite_real on t_composite  (cost=0.00..62694798.76 rows=15999664 width=8)
                              Filter: (real1 IS NOT NULL)
          -&gt;  WorkTable Scan on q  (cost=0.00..39.54 rows=10 width=8)
                Filter: (q.real1 IS NOT NULL)
                SubPlan 2
                  -&gt;  Limit  (cost=0.00..3.93 rows=1 width=8)
                        -&gt;  Index Scan using ix_composite_real on t_composite c  (cost=0.00..314695.34 rows=79998 width=8)
                              Index Cond: ((real1 &gt; $2) AND (real1 &lt;= 0.2::double precision))
  -&gt;  Nested Loop  (cost=0.00..16914.48 rows=1588 width=204)
        -&gt;  CTE Scan on q  (cost=0.00..2.02 rows=101 width=8)
        -&gt;  Index Scan using ix_composite_real on t_composite c  (cost=0.00..166.95 rows=40 width=212)
              Index Cond: ((c.real1 = q.real1) AND (c.real2 &lt;= 0.2::double precision))
</pre>
<p>This is almost instant again. Here are the I/O statistics:</p>
<pre class="brush: sql">
SELECT  pg_stat_reset();

WITH    RECURSIVE q (real1) AS
        (
        SELECT  MIN(real1)
        FROM    t_composite
        UNION ALL
        SELECT  (
                SELECT  c.real1
                FROM    t_composite c
                WHERE   c.real1 &gt; q.real1
                        AND c.real1 &lt;= 0.2
                ORDER BY
                        c.real1
                LIMIT 1
                )
        FROM    q
        WHERE   q.real1 IS NOT NULL
        )
SELECT  SUM(LENGTH(stuffing))
FROM    q
JOIN    t_composite c
ON      c.real1 = q.real1
        AND c.real2 &lt;= 0.2;

SELECT  pg_stat_get_blocks_fetched(&#039;ix_composite_real&#039;::regclass);
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>pg_stat_get_blocks_fetched</th>
</tr>
<tr>
<td class="int8">165</td>
</tr>
</table>
</div>
<p>The number of index reads is reduced greatly again.</p>
<h3>Summary</h3>
<p>In some cases, a range predicate (like <q>less than</q>, <q>greater than</q> or <q>between</q>) can be rewritten as an <code>IN</code> predicate against the list of values that could satisfy the range condition.</p>
<p>Depending on the column datatype, check constraints and statistics, that list could be comprised of all possible values defined by the column&#8217;s domain; all possible values defined by column&#8217;s minimal and maximal value, or all actual distinct values contained in the table. In the latter case, a loose index scan could be used to retrieve the list of such values.</p>
<p>Since an equality condition is applied to each value in the list, more access and join methods could be used to build the query plain, including range conditions on secondary index columns, hash lookups etc.</p>
<p>Whenever the optimizer builds a plan for a query that contains a range predicate, it should consider rewriting the range condition as an <code>IN</code> predicate and use the latter method if it proves more efficient.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Things SQL needs: MERGE JOIN that would seek</title>
		<link>http://explainextended.com/2010/05/07/things-sql-needs-merge-join-that-would-seek/</link>
		<comments>http://explainextended.com/2010/05/07/things-sql-needs-merge-join-that-would-seek/#comments</comments>
		<pubDate>Fri, 07 May 2010 19:00:53 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=4708</guid>
		<description><![CDATA[One of the most known and least used join algorithms in SQL engines is MERGE JOIN. This algorithm operates on two sorted recordsets, keeping two pointers that chase each other. The Wikipedia entry above describes it quite well in terms of algorithms. I&#8217;ll just make an animated GIF to make it more clear: This is [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most known and least used join algorithms in <strong>SQL</strong> engines is <a href="http://en.wikipedia.org/wiki/Merge_join"><code>MERGE JOIN</code></a>.</p>
<p>This algorithm operates on two sorted recordsets, keeping two pointers that chase each other.</p>
<p>The Wikipedia entry above describes it quite well in terms of algorithms. I&#8217;ll just make an animated <strong>GIF</strong> to make it more clear:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/05/test.gif" alt="" title="Merge join" width="520" height="640" class="aligncenter size-full wp-image-4709 noborder" /></p>
<p>This is quite a nice and elegant algorithm, which, unfortunately, has two major drawbacks:</p>
<ol>
<li>It needs the recordsets to be sorted</li>
<li>Even with the recordsets sorted, it is no better than a <code>HASH JOIN</code></li>
</ol>
<p>The sorting part is essential for this algorithm and there is nothing that can be done with it: the recordsets should be sorted, period. Databases, however, often provide the records in the sorted order: from clustered tables, indexes, previously sorted and ordered subqueries, spool tables etc.</p>
<p>But even when the recordsets are already sorted, on equijoins the <code>MERGE JOIN</code> is hardly faster than a <code>HASH JOIN</code>.</p>
<p>Why?<br />
<span id="more-4708"></span></p>
<h3>MERGE JOIN vs. HASH JOIN</h3>
<p>Let&#8217;s remember how the <code>HASH JOIN</code> works:</p>
<ul>
<li>It takes the smaller table and builds a hash table out of it, with the join key as the hash key.</li>
<li>Then it takes each record from the larger table and looks it up in the hash table. If found, the records are returned.</li>
</ul>
<p>We see that there are four major steps involved:</p>
<ol>
<li>Scan the smaller table</li>
<li>Build a hash table (i. e. copy each record from the smaller table into the hash slot)</li>
<li>Scan the larger table</li>
<li>Look up the larger table</li>
</ol>
<p>Since building and looking up the hash table are performed in memory (or, depending on the <strong>SQL</strong> engine implementation, in memory-mapped temporary database, which is almost the same), these steps take negligible time compared to the time required to scan the table.</p>
<p>But we see that <code>MERGE JOIN</code>, as it is implemented now, also requires scanning both recordsets. Each record should be evaluated by the pointer to figure out if its join key is more, less or equal to that of the the other pointer.</p>
<p>This means that both <code>MERGE JOIN</code> and <code>HASH JOIN</code> require scanning both recordsets. However, <code>HASH JOIN</code> does not require any special order, which means it can use a table scan, index fast full scan and any other methods to get the records all at once, while <code>MERGE JOIN</code> need either to sort the records (which is obviously slow) or to traverse the index with the subsequent key lookups (which is not fast too).</p>
<p>In some terminal cases <code>MERGE JOIN</code> can be more efficient indeed: say, when the hash table does not fit completely into memory and would require either extensive disk writes or several scans over the source tables, while a <code>MERGE JOIN</code> could be performed on a pair of indexes.</p>
<p>It is also efficient for <code>FULL OUTER JOIN</code>: each record is evaluated, returned and forgotten only once, while <code>HASH JOIN</code> would require a second pass over the records that had not been ever matched.</p>
<h3>Seeks instead of scans</h3>
<p>But does the <code>MERGE JOIN</code> really always need to traverse all records?</p>
<p>Let&#8217;s see some more pictures:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/05/scan.png" alt="" title="Scan" width="620" height="480" class="aligncenter size-full wp-image-4713 noborder" /></p>
<p>Here, the right recordset is <strong>100,000</strong> records ahead of the left recordset. With <code>MERGE JOIN</code>, <strong>100,000</strong> records should be scanned from the left recordset and <strong>100,000</strong> comparisons made.</p>
<p>This is unavoidable if the recordset is a result of a sort operation.</p>
<p>However, <code>MERGE JOIN</code> is usually chosen when there is a more efficient sorted row source available: an index or a spool table (temporary index built in runtime). And both these sources allow efficient random seeks.</p>
<p>If an index served as the left recordset, we could see that right pointer is too far ahead, and just seek for its value in the left recordset instead of scanning <strong>100,000</strong> records:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/05/seek.png" alt="" title="Seek" width="620" height="480" class="aligncenter size-full wp-image-4714 noborder" /></p>
<p>Here, we can see that <strong>100,000</strong> is too far away and could advance the left pointer to the position of the right pointer in only several reads, traversing the <strong>B-Tree</strong>.</p>
<p>Since the indexes usually collect statistics, all we would need to do to decide whether we need to seek or scan was to check the histograms to estimate how may records are there between the current and the opposite pointers. If there are too many, the seek cost would overweight the scan cost and a seek should be performed. The statistics table itself would not need to be queried too often: since the records are always selected in order, the statistics table could be also read sequentially.</p>
<h3>Emulation</h3>
<p>Let&#8217;s create a couple of <strong>PostgreSQL</strong> tables and see the performance benefit:</p>
<p><a href="#" onclick="xcollapse('X9277');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X9277" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_left (
        id INT NOT NULL PRIMARY KEY,
        good INT NOT NULL,
        bad INT NOT NULL,
        stuffing VARCHAR(200) NOT NULL
);

INSERT
INTO    t_left
SELECT  s, s, s, RPAD(&#039;&#039;, 200, &#039;*&#039;)
FROM    generate_series(1, 1000000) s;

CREATE UNIQUE INDEX ix_left_good ON t_left (good);

CREATE UNIQUE INDEX ix_left_bad ON t_left (bad);

CREATE TABLE t_right (
        id INT NOT NULL PRIMARY KEY,
        good INT NOT NULL,
        bad INT NOT NULL,
        stuffing VARCHAR(200) NOT NULL
);

INSERT
INTO    t_right
SELECT  s, s, s + 999000, RPAD(&#039;&#039;, 200, &#039;*&#039;)
FROM    generate_series(1, 1000000) s;

CREATE UNIQUE INDEX ix_right_good ON t_right (good);

CREATE UNIQUE INDEX ix_right_bad ON t_right (bad);
</pre>
</div>
<p>These two tables have <strong>1,000,000</strong> records each, and a common column that would return only <strong>1,000</strong> records in a join.</p>
<p>Here&#8217;s the plain query runs against these tables:</p>
<pre class="brush: sql">
SELECT  SUM(LENGTH(l.stuffing) + LENGTH(r.stuffing))
FROM    t_left l
JOIN    t_right r
ON      r.bad = l.bad
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">400000</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (1.4062s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=71338.33..71338.35 rows=1 width=408)
  -&gt;  Merge Join  (cost=58737.16..68838.33 rows=1000000 width=408)
        Merge Cond: (l.bad = r.bad)
        -&gt;  Index Scan using ix_left_bad on t_left l  (cost=0.00..56287.36 rows=1000000 width=208)
        -&gt;  Index Scan using ix_right_bad on t_right r  (cost=0.00..56287.36 rows=1000000 width=208)
</pre>
<p>Note that <strong>PostgreSQL</strong> used a <code>MERGE JOIN</code> without any tricks from our side. This is because the table records are too large and could not fit into a hash table all at once.</p>
<p>Of course, <strong>PostgreSQL</strong> could only store the record pointers in the hash table and do the record lookups after the join, however, for some reason it would not select this plan.</p>
<p><code>MERGE JOIN</code>, in our case, is quite efficient, since the indexes are read first and the actual records are only looked up for the matched records (which are not too numerous). However, it still requires traversing <strong>2,000,000</strong> records which takes more than a second.</p>
<p>Now, let&#8217;s emulate the <code>MERGE JOIN</code> doing the seeks instead of scans. To do that, we will write a recursive query:</p>
<pre class="brush: sql">
WITH    RECURSIVE q (l, r) AS
        (
        SELECT  (
                SELECT  l
                FROM    t_left l
                ORDER BY
                        bad
                LIMIT 1
                ),
                (
                SELECT  r
                FROM    t_right r
                ORDER BY
                        bad
                LIMIT 1
                )
        UNION ALL
        SELECT  CASE
                WHEN (q.l).bad &lt; (q.r).bad THEN
                        (
                        SELECT  li
                        FROM    t_left li
                        WHERE   li.bad &gt;= (q.r).bad
                        ORDER BY
                                bad
                        LIMIT 1
                        )
                WHEN (q.l).bad = (q.r).bad THEN
                        (
                        SELECT  li
                        FROM    t_left li
                        WHERE   li.bad &gt; (q.r).bad
                        ORDER BY
                                bad
                        LIMIT 1
                        )
                ELSE
                        l
                END,
                CASE
                WHEN (q.r).bad &lt; (q.l).bad THEN
                        (
                        SELECT  ri
                        FROM    t_right ri
                        WHERE   ri.bad &gt;= (q.l).bad
                        ORDER BY
                                bad
                        LIMIT 1
                        )
                WHEN (q.r).bad = (q.l).bad THEN
                        (
                        SELECT  ri
                        FROM    t_right ri
                        WHERE   ri.bad &gt; (q.l).bad
                        ORDER BY
                                bad
                        LIMIT 1
                        )
                ELSE
                        r
                END
        FROM    q
        WHERE   l IS NOT NULL
                AND r IS NOT NULL
        )
SELECT  SUM(LENGTH((q.l).stuffing) + LENGTH((q.r).stuffing))
FROM    q
WHERE   (q.l).bad = (q.r).bad
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>sum</th>
</tr>
<tr>
<td class="int8">400000</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0481s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=30.94..30.96 rows=1 width=64)
  CTE q
    -&gt;  Recursive Union  (cost=0.11..28.66 rows=101 width=64)
          -&gt;  Result  (cost=0.11..0.12 rows=1 width=0)
                InitPlan 1 (returns $1)
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_left_bad on t_left l  (cost=0.00..56287.36 rows=1000000 width=36)
                InitPlan 2 (returns $2)
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_right_bad on t_right r  (cost=0.00..56287.36 rows=1000000 width=36)
          -&gt;  WorkTable Scan on q  (cost=0.00..2.65 rows=10 width=64)
                Filter: ((q.l IS NOT NULL) AND (q.r IS NOT NULL))
                SubPlan 3
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_left_bad on t_left li  (cost=0.00..19598.69 rows=333333 width=36)
                              Index Cond: (bad &gt;= ($3).bad)
                SubPlan 4
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_left_bad on t_left li  (cost=0.00..19598.69 rows=333333 width=36)
                              Index Cond: (bad &gt; ($3).bad)
                SubPlan 5
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_right_bad on t_right ri  (cost=0.00..19598.69 rows=333333 width=36)
                              Index Cond: (bad &gt;= ($4).bad)
                SubPlan 6
                  -&gt;  Limit  (cost=0.00..0.06 rows=1 width=36)
                        -&gt;  Index Scan using ix_right_bad on t_right ri  (cost=0.00..19598.69 rows=333333 width=36)
                              Index Cond: (bad &gt; ($4).bad)
  -&gt;  CTE Scan on q  (cost=0.00..2.27 rows=1 width=64)
        Filter: ((l).bad = (r).bad)
</pre>
<p>This query makes a seek each time it needs to advance a pointer. This is not the most efficient way, but despite that fact, this query completes in only <strong>40 ms</strong>, which is <strong>25</strong> times as fast as the plain <code>MERGE JOIN</code> query.</p>
<h3>Summary</h3>
<p>With its current implementation, <code>MERGE JOIN</code> is not the most efficient algorithm, however, for several types of queries it outperforms <code>HASH JOIN</code>.</p>
<p>The main drawback of the <code>MERGE JOIN</code> is its inability to use seeks to advance the record pointers. Even if the opposite pointer is far away, the sequential scan is used instead of a <strong>B-Tree</strong> seek, even if the recordset is an index or a spool table.</p>
<p>To improve this, the accumulated index statistics should be taken into account when deciding whether to perform a seek or a sequential scan to catch up with the opposite pointer. If the statistics show a high number of the records in between, an index seek should be used instead of the index scan.</p>
<p>With this improvement, <code>MERGE JOIN</code> would perform much better, especially when joining two large indexed tables. It would require much less resources than a <code>HASH JOIN</code>, and, unlike <code>NESTED LOOPS</code>, the seeks would be performed only when really needed, thus preserving the benefits of the sequential access to the tables.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2010/05/07/things-sql-needs-merge-join-that-would-seek/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/05/07/things-sql-needs-merge-join-that-would-seek/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>NoSQL</title>
		<link>http://explainextended.com/2010/04/01/nosql/</link>
		<comments>http://explainextended.com/2010/04/01/nosql/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 09:00:46 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=4625</guid>
		<description><![CDATA[I had a vision tonight. A huge, dark, grim figure approached me, seized me with its long bony arms and made me see all the vanity of the world we are living in. Bloated database engines, useless ACID requirements, meaningless joins are now in the past for me. I decided to move to NoSQL. Where [...]]]></description>
			<content:encoded><![CDATA[<p>I had a vision tonight.</p>
<div id="attachment_4628" class="wp-caption aligncenter" style="width: 510px"><img src="http://explainextended.com/wp-content/uploads/2010/04/angel.jpg" alt="" title="Figure" width="500" height="309" class="size-full wp-image-4628" /><p class="wp-caption-text"><small>Image by <a href='http://www.flickr.com/photos/nataliejohnson/'>nataliej</a></small></p></div>
<p>A huge, dark, grim figure approached me, seized me with its long bony arms and made me see all the vanity of the world we are living in.</p>
<p>Bloated database engines, useless ACID requirements, meaningless joins are now in the past for me.</p>
<p>I decided to move to <strong>NoSQL</strong>.</p>
<p>Where do I begin?</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2010/04/01/nosql/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/04/01/nosql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Things SQL needs: sargability of monotonic functions</title>
		<link>http://explainextended.com/2010/02/19/things-sql-needs-sargability-of-monotonic-functions/</link>
		<comments>http://explainextended.com/2010/02/19/things-sql-needs-sargability-of-monotonic-functions/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 20:00:28 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=4149</guid>
		<description><![CDATA[I&#8217;m going to write a series of articles about the things SQL needs to work faster and more efficienly. With these things implemented, most of the tricks I explain in my blog will become redundant and I&#8217;ll finally have some more time to spend with the family. Ever wondered why a condition like this: WHERE [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_4352" class="wp-caption alignright" style="width: 410px"><img src="http://explainextended.com/wp-content/uploads/2010/02/graph.jpg" alt="" title="Graph" width="400" height="300" class="size-full wp-image-4352" /><p class="wp-caption-text"><small>Image by <a href='http://www.flickr.com/photos/ndevil/3491395689/'>ndevil</a></small></p></div>
<p>I&#8217;m going to write a series of articles about the things <strong>SQL</strong> needs to work faster and more efficienly.</p>
<p>With these things implemented, most of the tricks I explain in my blog will become redundant and I&#8217;ll finally have some more time to spend with the family.</p>
<p>Ever wondered why a condition like this:</p>
<p><code>WHERE   TRUNC(mydate) = TRUNC(SYSDATE)</code></p>
<p>, which searches for the current day&#8217;s records, is so elegant but so slow?</p>
<p>Of course this is because even if you create an index on <code>mydate</code>, this index cannot be used.</p>
<p>The expression in the left part of the equality is not a <code>mydate</code>. The database engine cannot find a way to use an index to search for it. It is said that this expression is not <a href="http://en.wikipedia.org/wiki/Sargable">sargable</a>.</p>
<p>Now, a little explanation about the indexes and sargability. If you are familiar with these, you can skip this chapter. But beware that this chapter is the only one illustrated, so skipping it will make the article too boring to read.</p>
<p>Ahem.</p>
<p>To locate a record in a <strong>B-Tree</strong> index, the keys of the index should be compared to the value being searched for.</p>
<p>Let&#8217;s consider this sample <strong>B-Tree</strong> index:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/02/index.png" alt="" title="B-Tree" width="600" height="120" class="aligncenter size-full wp-image-4207 noborder" /><br />
<span id="more-4149"></span><br />
As you can see, this structure maintains the record order.</p>
<p>Within one page, records are just sorted; the links between the pages obey the sorting order too. A binary search can be used to locate the record on a page; if the search resulted in a pair of adjacent records and the search expression is within the range of these records, then we just follow the link between them. All records behind this link will belong to the range, no matter how deep is the tree.</p>
<p>This works well when we search for the exact field that was indexed.</p>
<p>But what happens when we search for a derived expression?</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   value % 3 = 1
</pre>
<p>The <strong>B-Tree</strong> itself of course does not change. But, being casted to the expression we are searching for, the values stored will look like that:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/02/mod3.png" alt="" title="MOD3" width="600" height="124" class="aligncenter size-full wp-image-4245 noborder" /></p>
<p>As you can see, there is no order anymore. The records are neither ordered within one page, nor the links follow the order. A <strong>1</strong> we are searching for can appear anywhere in the tree. There is no other way than to traverse the whole tree and compare each record.</p>
<p>It is said that this expression is <em>unsargable</em>: the index can not be used to search for this expression.</p>
<p>But let&#8217;s consider another example:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   value + 3 = 10
</pre>
<p>Here&#8217;s how the <strong>B-Tree</strong> looks now from the point of view of the expression in the <code>WHERE</code> clause</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/02/plus3.png" alt="" title="Plus3" width="600" height="124" class="aligncenter size-full wp-image-4248 noborder" /></p>
<p>Now, we have a perfectly valid <strong>B-Tree</strong>: everything stays in order and the search algorithms can still be used.</p>
<p>Another example (<code>DIV</code> is the integer division operator):</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   value DIV 3 = 3
</pre>
<p>Here&#8217;s the <strong>B-Tree</strong> as seen from the expression&#8217;s side:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/02/div3.png" alt="" title="DIV3" width="600" height="124" class="aligncenter size-full wp-image-4251 noborder" /></p>
<p>The order persists.</p>
<p>This <strong>B-Tree</strong> is a little bit harder to traverse, since the key values are not unique anymore. But the same values of the keys are still contiguous, and the order between the different keys is still maintained.</p>
<p>Is this case, the algorithm should be changed just a little. When a key is found in the database, we should just continue searching to the left until we find the very first occurrence of the key with lower value. Then we continue searching to the right and return all the records with the correct value of the key.</p>
<p>This problem is of course well known and has long since been solved by all database systems: you can create an index on a non-unique field.</p>
<p>Finally, let&#8217;s consider one more expression, a very simple one:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   20 - value = 10
</pre>
<p>and how it sees the <strong>B-Tree</strong>:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2010/02/minus.png" alt="" title="Minus" width="600" height="124" class="aligncenter size-full wp-image-4255 noborder" /></p>
<p>In this case, the order is <em>reversed</em>. But there still is the order. The only difference is that the values are sorted right to left, not left to right, and all traversals should be made according to these new directions.</p>
<p>We see that some expressions break the <strong>B-Tree</strong> order, while the other ones maintain it.</p>
<p>What is the difference between the two?</p>
<p>For an expression to maintain the <strong>B-Tree</strong> order, it is necessary and sufficient for this expression to be a <a href="http://en.wikipedia.org/wiki/Monotonic_function">monotonic function</a> of the argument being indexed.</p>
<p>Here&#8217;s the definition of the monotonic function from Wikipedia:</p>
<blockquote><p>In calculus, a function <code>f</code> defined on a subset of the real numbers with real values is called monotonic (also monotonically increasing, increasing or non-decreasing), if for all <code>x</code> and <code>y</code> such that <code>x ≤ y</code> one has <code>f(x) ≤ f(y)</code>, so <code>f</code> <em>preserves</em> the order. Likewise, a function is called monotonically decreasing (also decreasing, or non-increasing) if, whenever <code>x ≤ y</code>, then <code>f(x) ≥ f(y)</code>, so it <em>reverses</em> the order.</p></blockquote>
<p>By definition, the monotonic function preserves (or reverses) the order, but does not break it.</p>
<p>This means that if an index is built over an expression, it also can be used to search over any monotonic function of this expression.</p>
<p>However, most query optimizers do not take this fact into account, while this behavior is relatively easy to implement.</p>
<h3>Sample implementation</h3>
<p>There will be many new keywords. I will introduce them all at the beginning and explain them later in the article. Also, there will be many fake functions, like <code>NEXT_COLLATION_CHARACTER</code> or <code>EXTRACT(calendar_date)</code>, which are absent in the actual engines but their name easily tells their purpose (and it&#8217;s quite simple to implement them anyway). These functions are used to demonstrate the concept.</p>
<p>A sargable function, as I shown above, needs to expose some monotony:</p>
<p><code><em>function</em>(<em>arg1</em>, …) <em><a href="#monotony_declaration">monotony_declaration</a></em></code></p>
<p>Function of two or more arguments can be sargable by some or all of it&#8217;s arguments:</p>
<p><code id="monotony_declaration"><em>monotony_declaration</em> := MONOTONIC ( <em><a href="#monotony_domain">monotony_domain</a></em> | OVER(<em>arg1</em>) <em><a href="#monotony_domain">monotony_domain</a></em> [, …])<br />
</code></p>
<p>When the function is a single-argument function, <code>OVER</code> clause should be omitted.</p>
<p><code id="monotony_domain"><em>monotony_domain</em> := (<em><a href="#monotony">monotony</a></em> | PIECEWISE <em><a href="#piecewise_monotony">piecewise_monotony</a></em>)<br />
</code></p>
<p>Monotony defines the direction and uniqueness of the function on the given range of the function&#8217;s domain:</p>
<p><code id="monotony"><em>monotony</em> := (STRICTLY INCREASING | INCREASING | STRICTLY DECREASING | DECREASING | UNDEFINED ) [ <em><a href="#inversion_clause">inversion_clause</a></em> ]</code></p>
<ul>
<li>
If monotony is <code>STRICTLY INCREASING</code>, it can be used for the <strong>B-Tree</strong> searches without changing the search order. A <code>UNIQUE</code> set of values is mapped to a <code>UNIQUE</code> set of the results of the function. The cardinality of the function over the indexed expression is the same as that of the expression.
</li>
<li>
If monotony is <code>INCREASING</code>, it can be used for the <strong>B-Tree</strong> searches without changing the search order. A <code>UNIQUE</code> set of values may be mapped to a non-unique set of the results, and no assumptions should be made about the uniqueness of the results. The cardinality of the function over the indexed expression may be not the same as that of the expression.
</li>
<li>
If monotony is <code>STRICTLY DECREASING</code>, it can be used for the <strong>B-Tree</strong> searches, changing the search order. A <code>UNIQUE</code> set of values is mapped to a <code>UNIQUE</code> set of the results of the function. The same algorithms that are used for scanning the index in the opposite order should be applied. The cardinality of the function over the indexed expression is the same as that of the expression.
</li>
<li>
If monotony is <code>STRICTLY DECREASING</code>, it can be used for the <strong>B-Tree</strong> searches, changing the search order. A <code>UNIQUE</code> set of values is mapped to a <code>UNIQUE</code> set of the results of the function. The same algorithms that are used for scanning the index in the opposite order should be applied. The cardinality of the function over the indexed expression may be not the same as that of the expression.
</li>
<li>
There is one more flavor, <code>UNDEFINED</code>, which states that the function is not monotonic or its monotony should not be relied upon. This has no sense as such, but can be used in the complex constructs we shall discuss below.
</li>
</ul>
<p>If the function is monotonic over the whole domain, then monotony is provided right after the keyword <code>MONOTONIC</code>.</p>
<p>However, a function can be piecewise monotonic. Its domain may consist of several (possibly infinite) number of pieces, with the function being monotonic over each piece but not across them.</p>
<h4>Piecewise monotony</h4>
<p><code id="piecewise_monotony"><em>piecewise_monotony</em> := PIECEWISE (<em><a href="#constant_piece_definition">constant_piece_definition</a></em> | <em><a href="#functional_piece_definition">functional_piece_definition</a></em>)</code></p>
<p>The pieces can be defined either by a set of constants or by another monotonic function, in which the pieces are defined by distinct values of the piece-defining function.</p>
<p><code id="constant_piece_definition"><em>constant_piece_definition</em> := WHEN VALUE [STRICTLY] LESS THAN <em>const</em> <em><a href="#monotony">monotony</a></em>, [ … ], WHEN VALUE LESS THAN MAXVALUE <em><a href="#monotony">monotony</a></em></code></p>
<p>Each <code>const</code> should be greater than the previous, and the last clause should be <code>WHEN VALUE THEN MAXVALUE</code>. The <code>const</code> comparison can be declared as strict: this may be important for handling discontinuities.</p>
<p><code id="functional_piece_definition"><em>functional_piece_definition</em> := DEFINED BY <em>piece_defining_function</em>(<em>arg</em>) CASE ( (WHEN <em>expression</em>(PIECE) THEN <em><a href="#monotony">monotony</a></em>) [ (…) | ELSE <em><a href="#monotony">monotony</a></em> ] ) END</code></p>
<p>Piece-defining function should itself be a monotonic function (but not a strict monotonic) over all domain of the original function.</p>
<p>The point of having this function is to split the domain into several pieces which themselves are sargable (hence the requirement for non-strict monotony), and then define the monotony within each piece.</p>
<p>The monotony for each piece is defined by <strong>SQL</strong>&#8216;s standard <code>CASE WHEN … THEN … ELSE … END</code> control flow construct (which of course returns monotonies instead of values).</p>
<h4>Two or more arguments</h4>
<p>A function of two or more arguments can be sargable too, as defined by <code><em><a href="#monotony_declaration">monotony_declaration</a></em></code>.</p>
<p>Each <code>OVER</code> clause defines monotony over one of the fuction&#8217;s arguments.</p>
<p>If an indexed field is used as a corresponding argument to this function and the other arguments are provided as constants or values from the leading table in the join, then the function is sargable.</p>
<h4>Superpositions</h4>
<p>When dealing with the superposition of functions, some assumptions can be made:</p>
<ul>
<li>A superposition of any number of <code>STRICTLY MONOTONIC</code> functions is <code>STRICTLY MONOTONIC</code>
</li>
<li>A superposition of any number of <code>STRICTLY MONOTONIC</code> functions and at least one <code>MONOTONIC</code> function is <code>MONOTONIC</code> (as opposed to <code>STRICTLY MONOTONIC</code>)
</li>
<li>A superposition of an <code>INCREASING</code> and a <code>DECREASING</code> function is <code>DECREASING</code>
</li>
<li>A superposition of an <code>INCREASING</code> and an <code>INCREASING</code> functions, or that of a <code>DECREASING</code> function and a <code>DECREASING</code> function is <code>INCREASING</code>
</li>
</ul>
<p>This only concerns functions monotonic over all domain (that is without <code>PIECEWISE</code> expressions).</p>
<h4>Inversions</h4>
<p><code id="inversion_clause">inversion_clause := INVERSE ( <em><a href="#inverse_expression">inverse_expression</a></em> | FROM <em><a href="#inverse_expression">inverse_expression</a></em> [ EXACT [ EXCLUDE ] ] TO <em><a href="#inverse_expression">inverse_expression</a></em> [ EXACT [ EXCLUDE ] ] )</code></p>
<p><code id="inverse_expression">inverse_expression := sql_expression ([ PIECE ], [ RESULT ], [ <em>non-over-arg1</em> [ ,… ] ])</code></p>
<p>Each monotony may be provided with an <em><a href="#inversion_clause">inversion_clause</a></em>, which maps the results back to the values of the arguments.</p>
<p>These expressions can accept the following arguments:</p>
<ul>
<li><code>RESULT</code>. This is the result of the function which needs to be mapped back to the value of the argument</li>
<li><code>PIECE</code>. This is the value defined by the piece-defining function. This only makes sense in a function-defined piecewise monotony</li>
<li><code>non-over-arg</code>. This is the argument to the function not used in the <code>OVER</code> clause. This only makes sense for multiple-argument functions.</li>
</ul>
<p>Inversion clause may come in two forms: scalar inversion and range inversion.</p>
<ul>
<li>
<p>A scalar inversion maps the function result to the single value of the argument.</p>
<p>If the monotony is strict, then the value of the expression is treated as a unique mapping. The decision on whether to return the value or not is made solely on the basis of the inverse result: function is not applied to the values taken from the index for more fine filtering.</p>
<p>If the monotony is not strict, them the value of the expression is treated as a reference point. Key reads back and forward to this reference point should be made with the function being applied to the result of each key read.</p>
</li>
<li>
<p>A range inversion maps the function result back to the range of values that are guaranteed to yield the <code>RESULT</code>, being provided as an argument to the function. The first expression provides the start value of the range, the second expression provides the end value.</p>
<p>Within the range, the results are returned from the index without any additional checking.</p>
<p>If any of the range expressions is marked as <code>EXACT</code>, then no values are returned outside the range in the corresponding direction. Otherwise, key reads are made in this direction and the function is applied to the results of the key reads to do the fine checking.</p>
<p>With <code>EXACT</code> clause, it is also possible to provide <code>EXCLUDE</code> clause. Depending on its presence, the corresponding range will be treated as including or not including the boundary.</p>
</li>
</ul>
<p>The point of allowing inexact inverse functions is to deal with the rounding errors introduced by floating point operations. Even if in theory the function can uniquely map back to a single value or a range of values, in practice, this may be not the case.</p>
<p>The <strong>B-Tree</strong> search is possible even without providing inverse functions at all. However, if inverse functions are provided, the number of the function evaluations is kept to the minimum: the function is only evaluated for several keys outside of the guaranteed range.</p>
<h3>Examples</h3>
<ul>
<li>
<p><code>EXP(arg FLOAT) MONOTONIC INCREASING INVERSE LN(RESULT)</code></p>
<p>Exponent function is increasing over the whole set of natural numbers. Its inverse function is natural logarithm.</p>
<p>Due to possible rounding errors, this function cannot be declared as strict.</p>
</li>
<li>
<p><code>YEAR(arg DATETIME) MONOTONIC INCREASING<br />
INVERSE<br />
FROM DATE_TRUNC('year', RESULT) EXACT<br />
TO DATE_TRUNC('year', RESULT) + INTERVAL 1 YEAR EXACT EXCLUDE</code></p>
<p><code>YEAR</code> is an increasing function but not a strictly increasing one: two distinct dates can share the same year. That&#8217;s why we provide two inverse functions: the first one provides the minimal possible date for a given year, the second one provides the maximal possible one. The latter function is marked as <code>EXCLUDE</code>, since the first date of the next year does not belong to the current year&#8217;s domain piece, but any value strictly less than it (and greater than the year&#8217;s beginning date) does.</p>
</li>
<li>
<p><code>ABS(arg INT) MONOTONIC PIECEWISE ON<br />
VALUES STRICTLY LESS THAN 0 STRICTLY DECREASING INVERSE -RESULT,<br />
VALUES LESS THAN MAXVALUE STRICTLY INCREASING INVERSE RESULT</code></p>
<p>The absolute value function is piecewise monotonic: it is strictly decreasing for the values below zero and strictly increasing for the values of zero and above.</p>
<p>The inverse functions for each piece are the negation of the result and result itself, accordingly.</p>
</li>
<li>
<p><code>FLOOR(arg FLOAT) MONOTONIC INCREASING<br />
INVERSE<br />
FROM RESULT EXACT<br />
TO RESULT + 1 EXACT EXCLUDE</code></p>
<p>The nearest lower integer function is monotonically increasing, almost the same as <code>YEAR</code> described above.</p>
</li>
<li>
<p><code>OPERATOR_DIVISION(arg1 FLOAT, arg2 FLOAT)<br />
MONOTONIC OVER (arg1)<br />
CASE WHEN arg2 > 0 THEN INCREASING INVERSE RESULT * arg2<br />
WHEN arg2 = 0 THEN UNDEFINED<br />
WHEN arg2 < 0 THEN DECREASING INVERSE RESULT * arg2<br />
END</code></p>
<p>This function defines division operator.</p>
<p>It is monotonic over the first argument. This means that the second argument should be provided as the run-time constant for it to be sargable: no field from the table being searched for can be used as a second argument to this function.</p>
<p>This function is monotonic over the whole domain of <code>arg1</code>, but the monotony depends on the value of <code>arg2</code>.</p>
<p>In theory, this function is strictly monotonic, but in practice, rounding errors can make the function yield same results for different values of <code>arg1</code>. That's why this function should be declared non-strict and a single inverse expression should be provided.</p>
</li>
<li>
<p><code>OPERATOR_PLUS(arg1 FLOAT, arg2 FLOAT) MONOTONIC<br />
OVER (arg1) STRICTLY INCREASING INVERSE RESULT - arg2,<br />
OVER (arg2) STRICTLY INCREASING INVERSE RESULT - arg1</code></p>
<p>This function defines the <code>+</code> operator.</p>
<p>This is quite simple: the function is strictly increasing over both arguments, and the inverse function is just a subtraction.</p>
</li>
<li>
<p><code>MONTH(arg DATETIME) MONOTONIC PIECEWISE<br />
DEFINED BY DATE_TRUNC('year', arg)<br />
INCREASING<br />
INVERSE<br />
FROM PIECE + INTERVAL RESULT - 1 MONTH EXACT<br />
TO PIECE + INTERVAL RESULT MONTH EXACT EXCLUDE</code></p>
<p><code>MONTH</code> returns the month number within a year.</p>
<p>This function is piecewise monotonic: within a year, higher dates always return same of higher months. Whenever the year changes, monotony breaks. <code>DATE_TRUNC</code> is monotonic and not strict, and, hence, can be used to define the pieces.</p>
<p>The reverse functions use both <code>PIECE</code> and <code>RESULT</code> arguments. The <code>PIECE</code> argument defines the current year, the <code>RESULT</code> is used to find the beginning of the current and the next months.</p>
</li>
<li>
<p><code>COS(arg FLOAT) MONOTONIC PIECEWISE<br />
DEFINED BY FLOOR(arg / PI())<br />
CASE PIECE % 2<br />
WHEN 0 THEN DECREASING INVERSE PIECE * PI() + ACOS(RESULT)<br />
ELSE INCREASING INVERSE (PIECE + 1) * PI() - ACOS(RESULT)<br />
END</code></p>
<p>Cosine is piecewise monotonic too. The pieces are defined by the cosine half-waves. Depending on the piece, it can be increasing or decreasing. The half-wave number is defined by <code>FLOOR(arg / PI())</code>. This piece-defining function, being a superposition of increasing functions, is increasing too.</p>
<p>On the even half-waves, the function is decreasing; on the odd half-waves, it is increasing. In principle, the function monotony is strict, but due to the possible rounding errors, it is marked as non-strict and only one inverse function provided.</p>
</li>
<li>
<p><code>SIN(arg FLOAT) MONOTONIC PIECEWISE<br />
DEFINED BY FLOOR(arg / PI() - 0.5)<br />
CASE PIECE % 2<br />
WHEN 0 THEN DECREASING INVERSE PIECE * PI() + ASIN(RESULT)<br />
ELSE INCREASING INVERSE PIECE * PI() - ASIN(RESULT)<br />
END</code></p>
<p>Same as cosine above, with a <code>&pi; / 2</code> shift.</p>
</li>
<li>
<p><code>LEFT(arg VARCHAR, length INT) MONOTONIC<br />
OVER(arg) INCREASING<br />
INVERSE<br />
FROM LEFT(arg, length) EXACT<br />
TO LEFT(arg, length - 1) || NEXT_COLLATION_CHARACTER(SUBSTRING(arg, length, 1)) EXACT EXCLUDE<br />
</code></p>
<p>Function <code>LEFT</code> takes the leading characters from the string.</p>
<p>It is sargable over the first argument: in the string is indexed, you can always search for a leading substring. I had to make up the function <code>NEXT_COLLATION_CHARACTER</code>, which would take next possible alphabet character according to the current collation.</p>
<p>Note that <strong>SQL Server</strong> and <strong>MySQL</strong> have traces of such optimization. In both systems, <code>LIKE</code> predicate is sargable if the search string does not contain the leading wildcards.</p>
<p>If you build a plan for a predicate like <code>column LIKE 'test%'</code>, you will see that is will use the index seek in this range:</p>
<p><code>column >= 'test' AND column < 'tesU'</code></p>
<p>This behavior is exactly like described above.</p>
</li>
</ul>
<h3>Search algorithm</h3>
<p>To select the values matching this condition:</p>
<p><code>function(col1 [ , arg2, … ]) = const</code></p>
<p>, the following procedures should be made:</p>
<ol>
<li>
<p>If the function is not sargable against the first argument, then just do the full table or index scan.</p>
</li>
<li>
<p>If the function is sargable against the first argument, check its monotony.</p>
</li>
<li>
<p>If the function is monotonic over all domain:</p>
<ol>
<li>
<p>If no inversion function is defined, then locate the values in the <strong>B-Tree</strong> using the <strong>B-Tree</strong> search algorithm and applying the function to each key before the comparison. The monotony direction (increasing or declreasing) should of course be taken into account.</p>
</li>
<li>
<p>If the function is strictly monotonic and a single inverse function is defined, apply the inverse function to the value of <code>const</code>, find the corresponding value of <code>col1</code> and locate it in the index using <strong>B-Tree</strong> search algorithm.</p>
</li>
<li>
<p>If the function is monotonic, but not strictly, check the inversion type:</p>
<ol>
<li>
<p>If a scalar inverse expression is defined, then:</p>
<ol>
<li>
<p>Apply the inverse function to <code>const</code> and find the key values in the <strong>B-Tree</strong> closest to the value of <code>inverse_expression(const)</code></p>
</li>
<li>
<p>Starting from <code>inverse_expression(const)</code>, iterate through the index back and forward, applying function to the each key found until the function exceeds (or falls short of) <code>const</code>, returning the records found.</p>
</li>
</ol>
</li>
<li>
<p>If a range inverse expression is defined, then:</p>
<ol>
<li>
<p>Find the beginning and the ending values of the <code>col1</code> range that yields <code>const</code><br />
If the beginning of range is not exact, iterate keys backwards, applying the function to each key found until the function equals to <code>const</code>.</p>
</li>
<li>
<p>Iterate the keys forward, returning the records found, until the end of the range is reached.</p>
</li>
<li>
<p>If the range end is not exact, then iterate keys beyond the range, applying the function to the keys found, until the function value exceeds or falls short of <code>const</code></p>
</li>
</ol>
</li>
</ol>
</li>
</ol>
</li>
<li>
<p>If the function is piecewise monotonic:</p>
<ol>
<li>
<p>If the pieces are defined by a set of constants, split the domain into the pieces, then apply the steps above to each piece separately with respect to the monotony of each piece.</p>
</li>
<li>
<p>If the pieces are defined by a function:</p>
<ol>
<li>
<p>Use loose index scan to build a distinct list of the function's values (each defining a piece). The loose index scan will jump over the <strong>B-Tree</strong>, starting from the minimal (or maximal) value of the function and recursively searching for the greater (or the lesser value). Each distinct value will reveal the least and the greatest value of the <code>col1</code> belonging to its piece.</p>
</li>
<li>
<p>Within the range defined by these values, apply the steps above, with respect to the monotony of each piece.</p>
</li>
</ol>
</li>
<li>
<p>If for any of the pieces the monotony is <code>UNDEFINED</code>, then just apply the function to every key in the piece's range, filtering the results.</p>
</li>
</ol>
</li>
</ol>
<h3>User-defined functions</h3>
<p>All examples above were applied to the built-in functions. In fact, they don't need declaration and all clauses above were added only to demonstrate the point.</p>
<p>However, with a little effort, the user-defined functions can be made sargable too.</p>
<p>Here are some suggestions on how to make it:</p>
<ol>
<li>To be sargable, a user-defined function should be strictly deterministic, that is its result should depend on and only on the values of the arguments (specifically, it should not depend on the database state)</li>
<li>If a function is a superposition of the monotonic functions (with the additional constraints stated above), it should be automatically treated as monotonic.</li>
<li>
<p>If a function is declared as monotonic by a user, special algorithms should be used to check its monotony (and hence sargability):</p>
<ol>
<li>
<p>Sargability of a monotonic user-defined function should be a property of index-function pair. Function's declared monotony is a necessary but not sufficient condition for sargability.</p>
<p>By default, sargability of index-function pair is <q>unknown</q>.</p>
</li>
<li>
<p>Whenever a query is issued which could be satisfied by using the index to search for the function values, the index-function pair is marked as <q>potentially sargable</q>. This property can be made dynamic: say, <strong>10</strong> or <strong>100</strong> queries should be issued before the sargability of the index-function pair can be checked.</p>
</li>
<li>When the query threshold is reached (this means that the certain number of queries which possibly could use the index had been run), the index checking process is initiated.</li>
<li>The database job which is responsible for collecting the index statistics, runs over the index and checks the declared monotony of the function over the given argument: within each piece, higher values of argument should map to higher (or lower, or not higher, or not lower) values of the function. If the function is piecewise monotonic with pieces defined by a user-defined function as well, the monotony of the piece-defined function should also be checked in the same routine.</li>
<li>If the check succeeds, the index-function is marked as <q>sargable</q> both for the function and its piece-defining function (if any).</li>
<li>At the same time, the statistics routine should collect the additional statistics for the function as if it was the index over the function's values.</li>
<li>
<p>If an index is marked as <q>usable</q> for any given function, all <code>DML</code> operations against this index should perform an additional validation for the function monotony:</p>
<ol>
<li>Whenever a new record is inserted into the index, the function should be applied to the values of the key inserted and both of its neighbors. The value of the function over the key inserted should be between (or, depending on the function's monotony, strictly between) the values of the function over the neighboring keys.</li>
<li>If the function is declared as piecewise monotonic with a piece defining function, the latter is checked first.</li>
<li>
<p>If any of the neighboring keys belongs to a piece different to that of the key being inserted, that neighboring key will not participate in the check.</p>
</li>
</ol>
</li>
<li>A failure to prove the function's monotony invalidates the function's usage in all indexes and also invalidates the sargability of all functions that use it as a piece-defining function. Additionally, the values of the arguments which are proven to violate the function's monotony are recorded with the function's metadata.
</li>
<li>Any change to the function's definition invalidates all sargability information linked with this function, as well as linked with the functions that uses it to define the pieces.</li>
<li>
<p>For any index-function pair, the validation and statistics collection process can be initiated manually. The database engine should provide means for that.</p>
</li>
</ol>
</li>
</ol>
<h3>Points of interest</h3>
<p>Function sargability may be used to improve many queries, specifically:</p>
<ul>
<li>
<p>Searching for dates within a given period (month, date, year) without a need to calculate the ranges manually:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   EXTRACT(YEAR_MONTH FROM col1) = &#039;201002&#039;
</pre>
</li>
<li>
<p>Searching for birthdates and calendar events:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   EXTRACT(calendar_date FROM mydate) = &#039;0310&#039;
</pre>
</li>
<li>
<p>Searching for the substrings:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   LEFT(value, 10) = &#039;abcdefghij&#039;
</pre>
</li>
<li>
<p>Searching for the <code>COALESCE</code>'d values:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
WHERE   COALESCE(value, &#039;test&#039;) = &#039;test&#039;
</pre>
</li>
</ul>
<p>and many others.</p>
<p>I really hope that this will be added to the major database engines in the nearest future.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2010/02/19/things-sql-needs-sargability-of-monotonic-functions/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/02/19/things-sql-needs-sargability-of-monotonic-functions/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Happy New Year!</title>
		<link>http://explainextended.com/2009/12/31/happy-new-year/</link>
		<comments>http://explainextended.com/2009/12/31/happy-new-year/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 20:00:47 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3926</guid>
		<description><![CDATA[Oracle New Year tree WITH tree AS ( SELECT /*+ MATERIALIZE */ &#039; ,,,&#34;&#039;&#039;&#039; AS needles, &#039;Oo%$&#38;&#039; AS decorations FROM dual ), branches AS ( SELECT level AS id, level - TRUNC((level - 2) / 6) * 2 AS s FROM dual CONNECT BY level &#60;= 24 ) SELECT RPAD(&#039; &#039;, 18, &#039; &#039;) &#124;&#124; [...]]]></description>
			<content:encoded><![CDATA[<h3>Oracle New Year tree</h3>
<pre class="brush: sql">
WITH    tree AS
        (
        SELECT  /*+ MATERIALIZE */
                &#039;  ,,,&quot;&#039;&#039;&#039; AS needles, &#039;Oo%$&amp;&#039; AS decorations
        FROM    dual
        ),
        branches AS
        (
        SELECT  level AS id, level - TRUNC((level - 2) / 6) * 2 AS s
        FROM    dual
        CONNECT BY
                level &lt;= 24
        )
SELECT  RPAD(&#039; &#039;, 18, &#039; &#039;) || &#039;*&#039; AS tree
FROM    dual
UNION ALL
SELECT  RPAD(&#039; &#039;, 18 - s, &#039; &#039;) || &#039;/&#039; ||
        (
        SELECT  REPLACE(
                SYS_CONNECT_BY_PATH
                (
                CASE
                WHEN DBMS_RANDOM.value &lt; 0.1 THEN
                        SUBSTR(decorations, TRUNC(DBMS_RANDOM.value * LENGTH(decorations)) + 1, 1)
                ELSE
                        SUBSTR(needles, TRUNC(DBMS_RANDOM.value * LENGTH(needles)) + 1, 1)
                END , &#039;/&#039;
                ),
                &#039;/&#039;, &#039;&#039;
                )
        FROM    tree
        WHERE   CONNECT_BY_ISLEAF = 1
        CONNECT BY
                level &lt;= s * 2 - 1
        )
        || &#039;\&#039;
FROM    branches
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>TREE</th>
</tr>
<tr>
<td class="varchar2">                  *</td>
</tr>
<tr>
<td class="varchar2">                 /,\</td>
</tr>
<tr>
<td class="varchar2">                /,, \</td>
</tr>
<tr>
<td class="varchar2">               /$,&quot; ,\</td>
</tr>
<tr>
<td class="varchar2">              /, &#39;,,  \</td>
</tr>
<tr>
<td class="varchar2">             / ,&#39;o&#39;&#39;  ,\</td>
</tr>
<tr>
<td class="varchar2">            /,, ,&#39;&quot;,,,, \</td>
</tr>
<tr>
<td class="varchar2">           /,&#39;,,$, &#39;&#39;  ,,\</td>
</tr>
<tr>
<td class="varchar2">            /,, ,&#39;&quot;,,,, \</td>
</tr>
<tr>
<td class="varchar2">           /&quot;,, ,,&quot;&#39; , &quot;,\</td>
</tr>
<tr>
<td class="varchar2">          /  $&#39; ,,  &quot; ,&#39; &#39;\</td>
</tr>
<tr>
<td class="varchar2">         /, , &#39;&quot;O,,&#39;o&quot;,&quot;  &quot;\</td>
</tr>
<tr>
<td class="varchar2">        /,,,,&quot; ,  &#39;,,&#39;,,&#39;,&#39;,\</td>
</tr>
<tr>
<td class="varchar2">       /,&#39;,,&quot;&#39;  &#39; &#39;  ,,,,&amp;&quot;o,\</td>
</tr>
<tr>
<td class="varchar2">        /,&#39; ,,,&quot;   ,&#39;,%&quot; ,,,\</td>
</tr>
<tr>
<td class="varchar2">       /,&#39;,,&quot;&#39;  &#39; &#39;  ,,,,&amp;&quot;o,\</td>
</tr>
<tr>
<td class="varchar2">      /,$,,,&quot; ,,$&#39;%&quot;,&quot;,,%,,&#39;&#39;&quot;\</td>
</tr>
<tr>
<td class="varchar2">     / &#39; ,,  ,&quot; ,,&amp;  , ,, ,,   \</td>
</tr>
<tr>
<td class="varchar2">    /,&quot;,O&#39; ,&#39;,, ,  ,$, &quot;&#39;,&quot;,,,&#39;,\</td>
</tr>
<tr>
<td class="varchar2">   /,,&quot; &quot;,,  ,,,  ,&#39;,&#39;, &quot;   ,&quot;&quot;&#39;,\</td>
</tr>
<tr>
<td class="varchar2">    / ,O &#39;,,,,,&quot; o  , &#39;&#39;,&#39;&#39;, ,&amp;,\</td>
</tr>
<tr>
<td class="varchar2">   /,,&quot; &quot;,,  ,,,  ,&#39;,&#39;, &quot;   ,&quot;&quot;&#39;,\</td>
</tr>
<tr>
<td class="varchar2">  /,&quot;&#39;&#39;,,&quot;&quot;$&quot;,&quot; ,$,%,&#39;  &quot;&quot; ,,,&#39;,&quot;,\</td>
</tr>
<tr>
<td class="varchar2"> /,   &#39;&quot; &quot;O,,o&#39; , ,&#39;% $&#39;,,, ,&quot;&quot;&quot; , \</td>
</tr>
<tr>
<td class="varchar2">/, O&#39;  ,,, &amp;,&#39;,,&#39;  , &#39;,$, ,&#39; ,,, ,,&#39;\</td>
</tr>
</table>
</div>
<h3>SQL Server bottle of champagne</h3>
<pre class="brush: sql">
WITH    lines AS
        (
        SELECT  1 AS n
        UNION ALL
        SELECT  n + 1
        FROM    lines
        WHERE   n &lt; 24
        ),
        bottle_parts (part, lft, body, rgt, w) AS
        (
        SELECT  &#039;mouth&#039;, &#039;.&#039;, &#039;_&#039;, &#039;.&#039;, 2
        UNION ALL
        SELECT  &#039;neck&#039;, &#039;|&#039;, &#039; &#039;, &#039;|&#039;, 2
        UNION ALL
        SELECT  &#039;bell&#039;, &#039;/&#039;, &#039; &#039;, &#039;\&#039;, NULL
        UNION ALL
        SELECT  &#039;border&#039;, &#039;/&#039;, &#039;=&#039;, &#039;\&#039;, NULL
        UNION ALL
        SELECT  &#039;body&#039;, &#039;|&#039;, &#039; &#039;, &#039;|&#039;, 7
        UNION ALL
        SELECT  &#039;bottom&#039;, &#039;(&#039;, &#039;_&#039;, &#039;)&#039;, 7
        ),
        bottle_lines AS
        (
        SELECT  CASE
                WHEN n = 1 THEN &#039;mouth&#039;
                WHEN n BETWEEN 2 AND 8 THEN &#039;neck&#039;
                WHEN n = 11 THEN &#039;border&#039;
                WHEN n BETWEEN 9 AND 13 THEN &#039;bell&#039;
                WHEN n BETWEEN 13 AND 23 THEN &#039;body&#039;
                WHEN n = 24 THEN &#039;bottom&#039;
                END AS part,
                n
        FROM    lines
        ),
        bottle_width AS
        (
        SELECT  bp.*, n, COALESCE(w, (n - 6)) AS width
        FROM    bottle_lines bl
        JOIN
                bottle_parts bp
        ON      bp.part = bl.part
        ),
        bottle AS
        (
        SELECT  n, width, REPLICATE(&#039; &#039;, 18 - width) + lft + REPLICATE(body, width * 2) + rgt AS line
        FROM    bottle_width
        ),
        bubbles (pos, m, bubble) AS
        (
        SELECT  CAST(ROUND(RAND() * 4, 0, 1) AS INTEGER), 1, &#039;:&#039; AS bubble
        UNION ALL
        SELECT  pos + ASCII(SUBSTRING(CAST(NEWID() AS VARCHAR(MAX)), 1, 1)) % 3 - 1, m + 1, SUBSTRING(&#039;:.oO&#039;, m / 3 + 1, 1)
        FROM    bubbles
        WHERE   m &lt; 12
        ),
        limits AS
        (
        SELECT  *,
                18 +
                CASE
                WHEN pos &lt; 1 - width THEN 1 - width
                WHEN pos &gt; width - 1 THEN width - 1
                ELSE pos
                END AS rpos
        FROM    bottle bo
        LEFT HASH JOIN
                bubbles bu
        ON      bu.m = 24 - bo.n
        )
SELECT  SUBSTRING(line, 1, COALESCE(rpos, 1) - 1) + COALESCE(bubble, &#039; &#039;) + SUBSTRING(line, COALESCE(rpos, 1) + 1, 200) AS bottle
FROM    limits
ORDER BY
        n
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>bottle</th>
</tr>
<tr>
<td class="varchar">                .____.</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">                |    |</td>
</tr>
<tr>
<td class="varchar">               /      \</td>
</tr>
<tr>
<td class="varchar">              /        \</td>
</tr>
<tr>
<td class="varchar">             /==========\</td>
</tr>
<tr>
<td class="varchar">            /      O     \</td>
</tr>
<tr>
<td class="varchar">           /        O     \</td>
</tr>
<tr>
<td class="varchar">           |       O      |</td>
</tr>
<tr>
<td class="varchar">           |        o     |</td>
</tr>
<tr>
<td class="varchar">           |        o     |</td>
</tr>
<tr>
<td class="varchar">           |       o      |</td>
</tr>
<tr>
<td class="varchar">           |        .     |</td>
</tr>
<tr>
<td class="varchar">           |       .      |</td>
</tr>
<tr>
<td class="varchar">           |       .      |</td>
</tr>
<tr>
<td class="varchar">           |       :      |</td>
</tr>
<tr>
<td class="varchar">           |       :      |</td>
</tr>
<tr>
<td class="varchar">           |      :       |</td>
</tr>
<tr>
<td class="varchar">           (______________)</td>
</tr>
</table>
</div>
<h3>PostgreSQL fireworks</h3>
<pre class="brush: sql">
WITH    centers AS
        (
        SELECT  angle,
                len,
                ROUND(len * SIN(2 * PI() * angle)) AS x,
                ROUND(len * COS(2 * PI() * angle)) AS y,
                ROUND(len * 0.3)::INTEGER + 1 AS trace
        FROM    (
                SELECT  RANDOM() AS angle,
                        8 * (1 - POWER(RANDOM(), 3)) AS len
                FROM    generate_series (1, 50) s
                ) q
        ),
        traces AS
        (
        SELECT  *,
                generate_series(1, trace) AS part
        FROM    centers
        ),
        parts AS
        (
        SELECT  CASE
                WHEN trace = part THEN
                        LEAST(len * 0.2, 2)::INTEGER
                ELSE
                        TRUNC(angle * 8 - 0.5)::INTEGER % 4 + 3
                END AS symbol,
                TRUNC(x + part * SIN(2 * PI() * angle)) AS x,
                TRUNC(y + part * COS(2 * PI() * angle)) AS y
        FROM    traces
        )
        SELECT  ARRAY_TO_STRING(
        ARRAY(
        SELECT  COALESCE(
                (
                SELECT  SUBSTR(E&#039;.xX\\-/|&#039;, MIN(symbol) + 1, 1)
                FROM    parts
                WHERE   x = col - 14
                        AND y = row - 12
                ), &#039; &#039;)
        FROM    generate_series(1, 25) col
        ), &#039;&#039;
        ) AS FIREWORKS
FROM    generate_series(1, 24) row
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>fireworks</th>
</tr>
<tr>
<td class="text">                         </td>
</tr>
<tr>
<td class="text">              X          </td>
</tr>
<tr>
<td class="text">              |  X       </td>
</tr>
<tr>
<td class="text">             x| |  X     </td>
</tr>
<tr>
<td class="text">        x    |  |  /     </td>
</tr>
<tr>
<td class="text">     X\ \x   |    /x     </td>
</tr>
<tr>
<td class="text">    X\ \ \        /      </td>
</tr>
<tr>
<td class="text">    X \   \ x    /  /xx  </td>
</tr>
<tr>
<td class="text">     \\  x  |.     ///   </td>
</tr>
<tr>
<td class="text">   X&#8211;    \  |           </td>
</tr>
<tr>
<td class="text">    xxx&#8211;   .            </td>
</tr>
<tr>
<td class="text">   Xx&#8211;      ..          </td>
</tr>
<tr>
<td class="text">       xxx-/       &#8211;x   </td>
</tr>
<tr>
<td class="text">          .  \\      &#8211;X </td>
</tr>
<tr>
<td class="text">             . .         </td>
</tr>
<tr>
<td class="text">      /            \     </td>
</tr>
<tr>
<td class="text">    X/       |    \ \x   </td>
</tr>
<tr>
<td class="text">         /   |\   \\     </td>
</tr>
<tr>
<td class="text">        //  |x\ \  \x    </td>
</tr>
<tr>
<td class="text">        x/  |\x \  X     </td>
</tr>
<tr>
<td class="text">        X   x\x  X       </td>
</tr>
<tr>
<td class="text">             X           </td>
</tr>
<tr>
<td class="text">                         </td>
</tr>
<tr>
<td class="text">                         </td>
</tr>
</table>
</div>
<h3>MySQL postcard</h3>
<pre class="brush: sql">
SELECT  CONCAT(COALESCE(border, &#039;|&#039;), RPAD(COALESCE(postcard, &#039;&#039;), 61, COALESCE(filler, &#039; &#039;)), COALESCE(border, &#039;|&#039;)) AS postcard
FROM    (
        SELECT  id2 * 5 + id1 AS line
        FROM    (
                SELECT 1 AS id1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
                ) q1
        CROSS JOIN
                (
                SELECT 0 AS id2 UNION ALL SELECT 1
                ) q2
        ) dummy
LEFT JOIN
        (
        SELECT  3 AS line, CONCAT(&#039; Dear &#039;, USER(), &#039;,&#039;) AS postcard
        UNION ALL
        SELECT  4, &#039; I wish you good luck in the New Year.&#039;
        UNION ALL
        SELECT  5, &#039; May all your plans be optimal, all your queries be answered&#039;
        UNION ALL
        SELECT  6, &#039; and all your estimations be correct.&#039;
        UNION ALL
        SELECT  8, &#039;               Yours sincerely,&#039;
        UNION ALL
        SELECT  9, &#039;                                                    Quassnoi&#039;
        ) ln
ON      ln.line = dummy.line
LEFT JOIN
        (
        SELECT 1 AS line, &#039;.&#039; AS border, &#039;_&#039; AS filler
        UNION ALL
        SELECT 10 AS line, &#039;.&#039; AS border, &#039;_&#039; AS filler
        ) borders
ON      borders.line = dummy.line
ORDER BY
        dummy.line
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>postcard</th>
</tr>
<tr>
<td class="varchar">._____________________________________________________________.</td>
</tr>
<tr>
<td class="varchar">|                                                             |</td>
</tr>
<tr>
<td class="varchar">| Dear dbuser@localhost,                                      |</td>
</tr>
<tr>
<td class="varchar">| I wish you good luck in the New Year.                       |</td>
</tr>
<tr>
<td class="varchar">| May all your plans be optimal, all your queries be answered |</td>
</tr>
<tr>
<td class="varchar">| and all your estimations be correct.                        |</td>
</tr>
<tr>
<td class="varchar">|                                                             |</td>
</tr>
<tr>
<td class="varchar">|               Yours sincerely,                              |</td>
</tr>
<tr>
<td class="varchar">|                                                    Quassnoi |</td>
</tr>
<tr>
<td class="varchar">._____________________________________________________________.</td>
</tr>
</table>
</div>
<div class="plainnote" style="text-align: center">
<big><strong>Happy New Year!</strong></big>
</div>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2009/12/31/happy-new-year/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/12/31/happy-new-year/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What is entity-relationship model?</title>
		<link>http://explainextended.com/2009/10/18/what-is-entity-relationship-model/</link>
		<comments>http://explainextended.com/2009/10/18/what-is-entity-relationship-model/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 19:00:46 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=2959</guid>
		<description><![CDATA[A relational database, as we all know from the previous article, stores relations between integers, strings and other simple types in a very plain way: it just enumerates all values related. This model is extremely flexible, since other relations can be easily derived from existing ones using mathematical formulae, and the relational database takes care [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_3549" class="wp-caption alignright" style="width: 410px"><img src="http://explainextended.com/wp-content/uploads/2009/10/er.jpg" alt="Image by Samuel Mann" title="Entity-relationship diagram" width="400" height="300" class="size-full wp-image-3549" /><p class="wp-caption-text">Image by <a href='http://www.flickr.com/photos/21218849@N03/2827389530/'>Samuel Mann</a></p></div>
<p>A relational database, as we all know from the <a href="/2009/08/23/what-is-a-relational-database/">previous article</a>, stores relations between integers, strings and other simple types in a very plain way: it just enumerates all values related.</p>
<p>This model is extremely flexible, since other relations can be easily derived from existing ones using mathematical formulae, and the relational database takes care of that.</p>
<p>However, database should reflect the real world in some way to be really of use. Since the relational databases store relations between mathematical abstractions and not real world things, we should make some kind of a mapping of ones to the others.</p>
<p>This is what entity-relationship model is for.</p>
<p>An entity-relationship model, as one can easily guess from its name, models relationships between entities.</p>
<p>But since we know that databases do essentially the same, how does it differ from the database model?</p>
<ul>
<li>An entity-relationship model states <strong>which</strong> data and relations between them should be stored</li>
<li>A database model states <strong>how</strong> these relations are stored</li>
</ul>
<p>In other words, ER model is design and database model is one of the ways to implement it. ER model is said to be above the database model in the <a href="http://en.wikipedia.org/wiki/Waterfall_model">waterfall developement</a>.</p>
<p>Note that in both cases I use the word <strong>stored</strong> above. The model says nothing of data and relations between them that may or should exist, only of those that should be stored. Every human being participates in thousands if not millions relationships, but an ER model should state which of them are to be stored, and a relational model should decide how to store them. Of course it would be nice to store them all (and police and insurance would be just happy), but the technology does not allow it yet. Don&#8217;t know whether it is good or bad.</p>
<p>A typical entity-relationship diagram in <a href="http://csc.lsu.edu/~chen/pdf/Chen_Pioneers.pdf">Peter Chen&#8217;s notation</a> looks like this:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/09/er.png" alt="An entity-relationship diagram" title="ER" width="650" height="170" class="aligncenter size-full wp-image-3072 noborder" /></p>
<p>What does it mean?<br />
<span id="more-2959"></span></p>
<h3>Entities</h3>
<p>Square boxes mean <em>entites</em>. What do we describe in our database?</p>
<p>In our case, the model requires that we store information about <em>clients</em>, <em>orders</em> and<br />
<em>items</em> ordered.</p>
<p>Square boxes are useless without additional information. Store <em>some</em> information? OK, we do. We just make a file to which a new line is added each time a new customer registers:</p>
<pre>
A customer has registered
A customer has registered
A customer has registered
…
</pre>
<p>, another file to which a new line is added when an order is made:</p>
<pre>
An order is made
An order is made
An order is made
…
</pre>
<p>and a third file to keep the list of items ordered.</p>
<pre>
An item is ordered
An item is ordered
…
</pre>
<p>Quite useless, right? However, there still is <em>some</em> information, as per request. We know that each row in the files describe a customer, an order or an item, and we can get the total number of customers, orders and items we&#8217;ve had so far.</p>
<p>This is better than nothing but still far from being usable.</p>
<p>Note that the diagram tells us nothing of how they should be stored. We could write them into a file, or on a paper or carve it on a stone.</p>
<h3>Relationships</h3>
<p>Diamonds mean <em>relationships</em>. How the entities are related to each other?</p>
<p>We see that we should store the relationships between orders and clients as well as those between items and orders.</p>
<p>More than that: tiny arrows between the diamonds and the boxes (that is, between the entities and relationships) show us <em>which</em> relationships should we store.</p>
<p>In our cases, we have a <code>1:(0-N)</code> relationship between clients and orders and a <code>1:(0-N)</code> relationship between orders and clients.</p>
<p><code>1:(0-N)</code> is spelled as <q>one to zero, one or many</q></p>
<p>That means that our database should make sure that any order is related to exactly one client.</p>
<p>Want to store an order? Be prepared to link it to the piece of information about the client who made it and make sure this information describes exactly one client. This is the database developer&#8217;s task.</p>
<p>However, a client can have an arbitrary number of orders: no orders at all, one order or many orders. The database should provide an ability to store the clients and orders this way.</p>
<p>Again, the entity-relationship model does not prescribe how should we store that data, as long as the storage method satisfies the conditions above.</p>
<p>We could store it in two files and relate them using the row numbers:</p>
<pre>
A customer has registered
A customer has registered
A customer has registered
…
</pre>
<pre>
An order is made by the customer described on line 1
An order is made by the customer described on line 2
An order is made by the customer described on line 2
</pre>
<p>, or we can just keep the information in a single text file:</p>
<pre>
A client has registered
Made an order
A client has registered
Made an order
Made an order
…
</pre>
<p>, or keep it in an <strong>XML</strong> file:</p>
<pre class="brush: xml">
&lt;root&gt;
 &lt;client&gt;
   &lt;order/&gt;
 &lt;/client&gt;
 &lt;client&gt;
   &lt;order/&gt;
   &lt;order/&gt;
 &lt;/client&gt;
&lt;/root&gt;
</pre>
<p>, or even bend down a finger each time a client makes an order. The latter one limits us to only <strong>2</strong> clients and <strong>5</strong> orders, but, you know, every system has its limitations.</p>
<h3>Attributes</h3>
<p>Ovals are <em>attributes</em>. <em>What information is stored in the database?</em></p>
<p>Mere enumerating the clients is nice but serves no purpose. Much better if you know the names of the clients; the orders need to be assigned with unique numbers that help to identify them; and it would be great to record which good did which item contain (so not only the number of packages could be checked but their contents too).</p>
<p>This is in fact what the database is for: storing information. Not only the links between the entities but the descriptions of the entities too.</p>
<p>Something like this:</p>
<pre class="brush: xml">
&lt;root&gt;
 &lt;client name=&quot;John&quot;&gt;
   &lt;order number=&quot;1&quot;&gt;
    &lt;item name=&quot;apple&quot;/&gt;
   &lt;/order&gt;
 &lt;/client&gt;
 &lt;client name=&quot;Jim&quot;&gt;
   &lt;order number=&quot;2&quot;&gt;
    &lt;item name=&quot;apple&quot;/&gt;
    &lt;item name=&quot;banana&quot;/&gt;
   &lt;/order&gt;
 &lt;/client&gt;
&lt;/root&gt;
</pre>
<p>That&#8217;s what makes database a database, not a computer-assisted finger-counting.</p>
<h3>Entity-relationship and relational model</h3>
<p>Everything above should be squeezed into relational model, which as we all know stores relations.</p>
<p>Since the time relational database appeared, they were mostly used to implement ER models. Multiple database manuals and guides describe the relational databases solely from that point of view. Various tools exist to automatically generate relational structure given a model.</p>
<p>However, ER model and a relational database are not the same. There is even no mapping to either side: same ER model can be implemented in different ways in relational database and vise versa, a relational structure can serve multiple ER models.</p>
<p>Due to the way the data are stored in a relational model, there is no reliable way to tell between attributes, entities and relationships by looking only at the relational model. These terms belong to the ER model. In a relational model, one thing can be implemented as an entity, relationship or an attribute.</p>
<p>In this article, I will give several examples.</p>
<h4>Attribute or relationship?</h4>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/FictionalCharacter.png" alt="Fictional character" title="Fictional character" width="300" height="400" class="alignright size-full wp-image-3484 noborder" /></p>
<p>Imagine a simple model as pictured in the diagram on the right.</p>
<p>The model requires that the database store fictional characters (as the entities). For each fictional character it should store their name, address, town and state (as the attributes).</p>
<p>There are no relations here: we store only one type of entities and their attributes.</p>
<p>As I already said earlier, the ER model specifies <em>what</em> should be stored and the database design (relational model in this case) decides <em>how</em>. One ot the benefits of the relational model is that is can construct the data representation on the fly, using <strong>SQL</strong> statements. That&#8217;s why there is more than one way to design a storage model, and as long as it&#8217;s possible to retrieve the data in expected form, every storage model is just as good.</p>
<p>One of the possible ways to implement this ER model in a relational database is to store the entities and their attributes in one table, like this:</p>
<table class="excel">
<caption>FictionalCharacter</caption>
<tr>
<th>ID</th>
<th>First Name</th>
<th>Last Name</th>
<th>Address</th>
<th>Town</th>
<th>State</th>
</tr>
<tr>
<td>1</td>
<td>Marty</td>
<td>McFly</td>
<td>9303 Lyon Estates</td>
<td>Hill Valley</td>
<td>CA</td>
</tr>
<tr>
<td>2</td>
<td>Arnold</td>
<td></td>
<td>4040 Vineland</td>
<td>Hillwood</td>
<td>WA</td>
</tr>
<tr>
<td>3</td>
<td>Hank</td>
<td>Hill</td>
<td>123 Rainey Street</td>
<td>Arlen</td>
<td>TX</td>
</tr>
</table>
<p>This table serves two purposes: it both defines an entity and stores its attributes. The entities are defined by being listed in a table with their <code>id</code> (a value being used in the relations) defined as a <code>PRIMARY KEY</code>. </p>
<p>Not all fictional characters, though, have last names. Arnold of <a href="http://en.wikipedia.org/wiki/Hey_Arnold!">Hey, Arnold!</a> does not.</p>
<p>There is one more way to implement this model: store the entities and attributes in two (or even more) separate relational tables. If an attribute is rarely set or rarely used, it can be offloaded into another table:</p>
<table class="excel">
<caption>FictionalCharacter</caption>
<tr>
<th>ID</th>
<th>First Name</th>
<th>Address</th>
<th>Town</th>
<th>State</th>
</tr>
<tr>
<td>1</td>
<td>Marty</td>
<td>9303 Lyon Estates</td>
<td>Hill Valley</td>
<td>CA</td>
</tr>
<tr>
<td>2</td>
<td>Arnold</td>
<td>4040 Vineland</td>
<td>Hillwood</td>
<td>WA</td>
</tr>
<tr>
<td>3</td>
<td>Hank</td>
<td>123 Rainey Street</td>
<td>Arlen</td>
<td>TX</td>
</tr>
</table>
<table class="excel">
<caption>FictionalCharacterLastName</caption>
<tr>
<th>ID</th>
<th>Last Name</th>
</tr>
<tr>
<td>1</td>
<td>McFly</td>
</tr>
<tr>
<td>3</td>
<td>Hill</td>
</tr>
</table>
<p>The model remains the same: the last name is an attribute of an entity, possibly undefined; but the relational implementations differ. Nevertheless, we can easily transform one representation to another by issuing <strong>SQL</strong> queries.</p>
<p>This second one can be transformed to the first one using this query:</p>
<pre class="brush: sql">
SELECT  c.*, ln.LastName
FROM    FictionalCharacter c
LEFT JOIN
        FictionalCharacterLastName ln
ON      ln.id = c.id
</pre>
<p>and vice versa:</p>
<pre class="brush: sql">
SELECT  c.id, c.FirstName, ln.LastName, c.Address, c.Town, c.State
FROM    FictionalCharacter c
LEFT JOIN
        FictionalCharacterLastName ln
ON      ln.id = c.id
</pre>
<pre class="brush: sql">
SELECT  c.id, c.LastName
FROM    FictionalCharacter c
WHERE   c.LastName IS NOT NULL
</pre>
<p>The first model (storing all attributes in one table) seems more simple. Why should one ever use the second solution which is more complex?</p>
<p>The reason is that the relational database, itself being an abstraction, should nevertheless be stored on a real world disk drive and served by a real world software. And when it comes to the real world, as we all know, several trade-offs should be made.</p>
<p>In the world of relational databases there is a convention to use a special value, <code>NULL</code>, to mark the data whose value is <em>undefined</em>, that is not known, not stored or not cared for. This value is not equal (and not even unequal) to any value (including itself), the results of most operators and functions over this value are also undefined.</p>
<p>However, it still should be stored in the database somehow. Most databases optimize it to occupy as few space as possible. Oracle, for instance, stored data in rows of dynamic size, the row itself and each column being prepended with their sizes. If the column count in a given row is less than it should be (i. e. there are fewer columns stored in a row than it is stated in the table definition), all other columns are considered to have the value of <code>NULL</code>. This makes storing <code>NULLS</code> in trailing columns to be free in terms of the disk drive space.</p>
<p>But sometimes <code>NULL</code> still occupies some space. Some systems store the data in rows of fixed length; some use no strict datatypes and therefore should store the database along with each value (rather than relying on the table definition); some (like Oracle) should store the <code>NULL</code> if the column is not trailing, i. e. there are some non-<code>NULL</code> values after it.</p>
<p>This makes the table bigger in size. And the queries that do not return this attribute are slower than they would be were it not for that column, since the data occupies more space and the engine needs to read more datapages.</p>
<p>On the other hand, the queries returning <em>all</em> columns, including that which is rarely used, are faster than they would be were this column stored in a separate table. Extra disk reads and CPU time are required to join these columns into one resulting relation and present it to the querying code.</p>
<p>The relations we get as a result of the queries are exactly same, since the relational databases separate logical presentation from the physical storage. But <q>they do it</q> does not mean <q>they do it the best way</q>. They do not know beforehand which query shall we do more often: the one that returns all attributes or the one that omits some of them.</p>
<p>The two queries, one returning the missing attribute and one lacking it, cannot be optimized together. Making one faster usually makes the other one slower. If the attribute is stored only for some, not for all entities, or the attribute is not queried for much often, it may be useful to store it in a separate table on the disk, so that the most used query is faster.</p>
<p>Both these storage methods implement one ER model. It&#8217;s only the way the attribute is stored is changed, not the way it is presented logically. We can even create a <em>view</em> or a <em>stored procedure</em> that would hide this storage method from the calling application completely.</p>
<p>In a relational database, the ER model defines the number and the possible values of the attributes but not the way they are stored.</p>
<p>The entity should be defined by a relation corresponding the <code>PRIMARY KEY</code> of the entity with all of its attributes. The database can store this relation in a relational table as is or generate it from the several tables on demand. As long as the database is able to do that, the model implementation is considered valid.</p>
<p>Storing something in another table and joining the tables to get the results usually means a relationship (in this case a one-to-one relationship). But in this model it is not a relationship that the separate table defines. It is just a more efficient way to store an attribute.</p>
<h4>Attribute or entity?</h4>
<p>Back to the model that describes fictional characters:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/FictionalCharacter.png" alt="Fictional character" title="Fictional character" width="300" height="400" class="aligncenter size-full wp-image-3484 noborder" /></p>
<p>and its implementation:</p>
<table class="excel">
<caption>FictionalCharacter</caption>
<tr>
<th>ID</th>
<th>First Name</th>
<th>Last Name</th>
<th>Address</th>
<th>Town</th>
<th>State</th>
</tr>
<tr>
<td>1</td>
<td>Marty</td>
<td>McFly</td>
<td>9303 Lyon Estates</td>
<td>Hill Valley</td>
<td>CA</td>
</tr>
<tr>
<td>2</td>
<td>Arnold</td>
<td></td>
<td>4040 Vineland</td>
<td>Hillwood</td>
<td>WA</td>
</tr>
<tr>
<td>3</td>
<td>Hank</td>
<td>Hill</td>
<td>123 Rainey Street</td>
<td>Arlen</td>
<td>TX</td>
</tr>
</table>
<p>Here we see an attribute, <code>State</code>, which contains a two-letter state code.</p>
<p>Now, what if we want to select all characters who live in states with population less than <strong>10,000,000</strong>?</p>
<p>We don&#8217;t keep information about the population of the U.S. states in our model. We will need to add it to the model somehow.</p>
<p>We could possibly add an additional attribute, <code>StatePopulation</code>, that would show the population of the state the character lives in.</p>
<p>However, this is not the best design: whenever we need to reflect a change in state population in our model, we should update the records for all character who live in these states. In addition, we should make sure that all these records are consistent: all characters who live in same state should have same <code>StatePopulation</code>.</p>
<p>It would be wrong if two Hills lived in same Texas with different populations. <code>StatePopulation</code> is a property of a state, not that of a person who lives in that state.</p>
<p>But the entity-relationship model does not allow adding attributes to attributes. Instead, we should make <code>State</code> an entity, define a many-to-one relationship between the persons and the states and keep the <code>Population</code> as an attribute of a <code>State</code>.</p>
<p>Here&#8217;s how the new model would look:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/FictionalCharacterState.png" alt="Fictional character and state" title="Fictional character and state" width="400" height="300" class="aligncenter size-full wp-image-3507 noborder" /></p>
<p>This process is usually called normalization. However, normalization is relational concept, not the ER concept. Here is the definition of the <a href="http://en.wikipedia.org/wiki/Third_normal_form">third normal form</a> (which was what we made here) from Wikipedia:</p>
<blockquote><p>
The third normal form (<strong>3NF</strong>) is a normal form used in database normalization. <strong>3NF</strong> was originally defined by E.F. Codd in <strong>1971</strong>. Codd&#8217;s definition states that a table is in <strong>3NF</strong> if and only if both of the following conditions hold:</p>
<ul>
<li>The relation <code>R</code> (table) is in second normal form (<strong>2NF</strong>)</li>
<li>Every non-prime attribute of <code>R</code> is non-transitively dependent (i.e. directly dependent) on every key of <code>R</code>.</li>
</ul>
</blockquote>
<p>As we can see, there are a lot of things in this definition that the ER model knows nothing about: relations, keys, tables. But the ER model has a similar concept: attributes cannot have the child attributes, entites can. If you want to define an attribute for an attribute, promote the latter to an entity and define the relationship. <q>Transitive dependence</q> is just another name for an <q>attribute of an attribute</q>.</p>
<p>Here is how this new ER model could be implemented in a relational database:</p>
<table class="excel">
<caption>FictionalCharacter</caption>
<tr>
<th>ID</th>
<th>First Name</th>
<th>Last Name</th>
<th>Address</th>
<th>Town</th>
<th>State</th>
</tr>
<tr>
<td>1</td>
<td>Marty</td>
<td>McFly</td>
<td>9303 Lyon Estates</td>
<td>Hill Valley</td>
<td>CA</td>
</tr>
<tr>
<td>2</td>
<td>Arnold</td>
<td></td>
<td>4040 Vineland</td>
<td>Hillwood</td>
<td>WA</td>
</tr>
<tr>
<td>3</td>
<td>Hank</td>
<td>Hill</td>
<td>123 Rainey Street</td>
<td>Arlen</td>
<td>TX</td>
</tr>
</table>
<table class="excel">
<caption>State</caption>
<tr>
<th>State</th>
<th>Population</th>
</tr>
<tr>
<td>CA</td>
<td>36,756,666</td>
</tr>
<tr>
<td>TX</td>
<td>24,326,974</td>
</tr>
<tr>
<td>WA</td>
<td>6,549,224</td>
</tr>
</table>
<p>To select all characters who live in states with population less than <strong>10,000,000</strong>, we just issue the following query:</p>
<pre class="brush: sql">
SELECT  FictionalCharacter.*
FROM    FictionalCharacter
JOIN    State
ON      State.State = FictionalCharacter.State
WHERE   State.Population &lt; 10000000
</pre>
<p>Now lets see what have we changed here.</p>
<p>In the ER model:</p>
<ul>
<li><code>State</code> became an entity</li>
<li>One attribute, <code>State</code> was removed from <code>FictionalCharacter</code></li>
<li>An additional relationship appeared: <code>FictionalCharacter</code> <em>lives in</em> <code>State</code></li>
<li>An additional attribute, <code>Population</code>, was added to <code>State</code></li>
</ul>
<p>In the relational model:</p>
<ul>
<li>One additional table, <code>State</code>, appeared.</li>
<li>One constraint, a <code>FOREIGN KEY</code> referencing <code>State</code>, was added to the <code>FictionalCharacter</code></li>
</ul>
<p>Nothing had changed in the original table except for the <code>FOREIGN KEY</code> constraint added to ensure the data integrity.</p>
<p>We see that promoting the <code>State</code> to an entity didn&#8217;t affect the original table. The field <code>State</code> remained. The only thing that changed for this attribute that it now refers a record in the new table, <code>State</code>, that both defines the list of states and holds the state populations.</p>
<p>This is because relational model (unlike ER model) does not really distinguish between the attributes and entities.</p>
<p>Relational model stores relations between the simple types (like integers and strings). A real world relational database stores relations between the integers and strings that describe some real world data.</p>
<p>A U.S. state does not become less of an entity, be it listed as an entity or as an attribute in the ER model. If it&#8217;s enough for us to know only the state code (which is used in the relational model to represent the state in the relations stored in the database), we can store it in an unconstrained field; if it is not, we should create a separate table for it and constrain the value of the field using a <code>FOREIGN KEY</code>.</p>
<p>In this very case it&#8217;s easy to tell that there is a relation by looking into the table structure. There is a <code>FOREIGN KEY</code> that gives us a hint that the <code>State</code> is actually a field defining a one-to-many relationship, not a mere attribute.</p>
<p>But&#8217;s let&#8217;s consider another model:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/GoodsBonuses.png" alt="Goods and Bonuses" title="Goods and Bonuses" width="600" height="300" class="aligncenter size-full wp-image-3511 noborder" /></p>
<p>This model described goods and price-based bonuses. The price ranges should be contiguous and the last range should have no upper bound.</p>
<p>Here&#8217;s how it would look in a relational database:</p>
<table class="excel">
<caption>Goods</caption>
<tr>
<th>ID</th>
<th>Name</th>
<th>Price</th>
</tr>
<tr>
<td>1</td>
<td>Wormy apple</td>
<td class="double">0.09</td>
</tr>
<tr>
<td>2</td>
<td>Bangkok durian</td>
<td class="double">9.99</td>
</tr>
<tr>
<td>3</td>
<td>Densuke watermelon</td>
<td class="double">999.99</td>
</tr>
<tr>
<td>4</td>
<td>White truffle</td>
<td class="double">99999.99</td>
</tr>
</table>
<table class="excel">
<caption>PriceRange</caption>
<tr>
<th>Price</th>
<th>Bonus</th>
</tr>
<tr>
<td class="double">0.01</td>
<td class="double">1%</td>
</tr>
<tr>
<td class="double">1.00</td>
<td class="double">3%</td>
</tr>
<tr>
<td class="double">100.00</td>
<td class="double">10%</td>
</tr>
<tr>
<td class="double">10000.00</td>
<td class="double">30%</td>
</tr>
</table>
<p>Does the <code>PriceRange</code> table really define the price range entities?</p>
<p>Sure it does. It contains all required information: the <code>StartPrice</code> (as a field), the <code>EndPrice</code> (as the same field in the <em>next</em> record in <code>PriceOrder</code>) and the <code>Bonus</code> (as a field in the record that defines the <code>StartPrice</code>). It is possible to write a query that would return a classical relation with all three attributes in one record:</p>
<pre class="brush: sql">
SELECT  Price AS StartPrice,
        (
        SELECT  MIN(Price)
        FROM    PriceRange NextRange
        WHERE   NextRange.Price &gt; CurrentRange.Price
        ) AS EndPrice,
        Bonus
FROM    PriceRange CurrentRange
</pre>
<p>Due to the way the entity is defined, it is impossible to put a <code>FOREIGN KEY</code> constraint to a price range. The entity <q>price range</q> is not even being defined by a relation stored in a database: instead, a more simple relation is stored, which corresponds the value of the attribute (<code>Bonus</code>) with a lower bound of the price range.</p>
<p>To build the relation which would correspond each good with its price range and the bonus, we should issue the following query:</p>
<pre class="brush: sql">
SELECT  *
FROM    Goods
JOIN    PriceRange
ON      PriceRange.Price =
        (
        SELECT  MAX(Price)
        FROM    PriceRange
        WHERE   PriceRange.Price &lt;= Goods.Price
        )
</pre>
<p>We could probably use a more formalistic approach to the problem: store <code>PriceRange</code> in a table with a surrogate <code>PRIMARY KEY</code>, the lower and upper bounds as the attributes; and store a <code>FOREIGN KEY</code> reference to it in <code>Goods</code>.</p>
<p>This design would be more familiar but it could easily lead to data inconsistency. What if the <code>Price</code> of a good is not within the <code>PriceRange</code> the good references? What is the price ranges are not contiguous? What if the price ranges values are updated and the referencing columns are not?</p>
<p>The root of this problem it that the mechanisms to ensure referential integrity work on <em>declared</em> relationships, i. e. those just listed in a table, while what we deal with here is an <em>inferred</em> relationship.</p>
<p>The relationship between a good and a price range is not defined by a mere declaration of the fact that the relationship exists, i. e. it is not taken from the outside world. Instead, whether the relationship exists or not is defined by the values of the attributes of a good and a price range. It is generated from the other information stored in the database. And the relational databases, as I already said above, can transform the relations they store in the tables to other relations, using the relational algebra, mathematical operators and other means.</p>
<p>This relational design implements the ER model more consistently and efficiently, and this implementation is less prone to errors. However, using this design, we cannot use automated tools to map the relational model back to the ER model anymore. We don&#8217;t define the <code>PriceRange</code> and both of its attributes using a single stored relation referenceable with a <code>FOREIGN KEY</code>. From the relational model&#8217;s point of view, the <code>Price</code> is just an attribute of a <code>Good</code> which has no reference to the <code>PriceRange</code>. Nevertheless, the <code>PriceRange</code> is an entity with its attributes (which can only be represented by self-joining the table), and <code>Price</code> is used to reference it.</p>
<p>The fact that this design is more consistent, however, does not mean that the original design should never be used. Database normalization helps to cope with the possible inconsistencies in the model by inferring relations rather than storing them, but this can require additional resources (memory, CPU etc) and therefore be less efficient. There are numerous situations which require to store relations in a way that allows possible logical inconsistencies but is faster to query. Keeping the model consistent now becomes a developer&#8217;s task but it is the price one has to pay for speed. This process is called <a href="http://en.wikipedia.org/wiki/Denormalization">denormalization</a>.</p>
<p>I will not discuss benefits and drawbacks of normalization and denormalization in this article. What I wanted to say is that usually there are many ways to implement an ER model in a relational database and not all these methods can be easily mapped back to the ER model which clearly distinguishes between entities, attributes and relationships.</p>
<p>A relational model uses same datatypes to represent entities and to define attributes. Therefore, it is best thought of as a way to store relations between the entities <em>of the real world</em> (represented by the numbers and strings), some of which are not of interest enough to be the entities <em>of the model</em>.</p>
<p>Whether the values the table relates represent entities or attributes is not a concern of a relational model. It can store and return both with equal efficiency.</p>
<h4>Relationship or entity?</h4>
<p>Consider this model:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/CouplesSingle.png" alt="Couples (one-to-one)" title="Couples (one-to-one)" width="600" height="300" class="aligncenter size-full wp-image-3523 noborder" /></p>
<p>It describes fictional persons who are married to each other.</p>
<p>This can be implemented in a relational database as this:</p>
<table class="excel">
<caption>FictionalPerson</caption>
<tr>
<th>ID</th>
<th>Gender</th>
<th>First Name</th>
<th>Last Name</th>
</tr>
<tr>
<td>1</td>
<td>M</td>
<td>Desmond</td>
<td>Jones</td>
</tr>
<tr>
<td>2</td>
<td>F</td>
<td>Mary</td>
<td>Jones</td>
</tr>
</table>
<table class="excel">
<caption>Marriage</caption>
<tr>
<th>ID</th>
<th>Husband</th>
<th>Wife</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
</tr>
</table>
<p>Here we see that Desmond Jones is married to Mary Jones. Marriage of course is relationship (and is called so even outside the database world).</p>
<p>That&#8217;s fine. Now we want to add <a href="http://en.wikipedia.org/wiki/Scarlett_O%27Hara">Scarlett O&#8217;Hara</a> into the model. She was married first to Ashley Wilkes, then to Rhett Butler.</p>
<p>The model as it is now does not let us do it, since a marriage is a one-to-one relation (at least in the Christian tradition). We should change this relation to a many-to-many (which changes its wording from <em>is married to</em> to <em>has ever been married to</em>).</p>
<p>Here&#8217;s how the new model would look like:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/CouplesMultiple.png" alt="Couples (many-to-many)" title="Couples (many-to-many)" width="600" height="300" class="aligncenter size-full wp-image-3525 noborder" /></p>
<table class="excel">
<caption>FictionalPerson</caption>
<tr>
<th>ID</th>
<th>Gender</th>
<th>First Name</th>
<th>Last Name</th>
</tr>
<tr>
<td>1</td>
<td>M</td>
<td>Desmond</td>
<td>Jones</td>
</tr>
<tr>
<td>2</td>
<td>F</td>
<td>Mary</td>
<td>Jones</td>
</tr>
<tr>
<td>3</td>
<td>M</td>
<td>Ashley</td>
<td>Wilkes</td>
</tr>
<tr>
<td>4</td>
<td>M</td>
<td>Rhett</td>
<td>Butler</td>
</tr>
<tr>
<td>5</td>
<td>F</td>
<td>Scarlett</td>
<td>O&#8217;Hara</td>
</tr>
</table>
<table class="excel">
<caption>Marriage</caption>
<tr>
<th>ID</th>
<th>Husband</th>
<th>Wife</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
</table>
<p>The ER model changed the relationship from <q>one-to-one</q> to <q>many-to-many</q>.</p>
<p>To reflect this, the relational model dropped the separate <code>UNIQUE</code> constraints on <code>Husband</code> and <code>Wife</code> and added a composite <code>UNIQUE</code> constraint on <code>(Husband, Wife)</code>.</p>
<p>Now, let&#8217;s the add two more persons to our model: Ransie and Ariela Bilbro from O. Henry&#8217;s <a href="http://www.gutenberg.org/dirs/etext99/8whrl11h.htm#12">The Whirligig of Life</a>. They married each other <em>twice</em>.</p>
<p>This time even changing the relationship type in the ER model won&#8217;t help. There is no such thing as a <q>double relationship</q>. To reflect the double marriages, the relationship should be promoted to a first-class entity with one-to-one relation to either part (which now means <em>been in this marriage</em>).</p>
<p>The model would look like this now:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/10/CouplesMarriages.png" alt="CouplesMarriages" title="CouplesMarriages" width="600" height="300" class="aligncenter size-full wp-image-3527 noborder" /></p>
<p>We see that the ER model changed significantly. How did the relational model change?</p>
<p>Since the husband and wife do not uniquely define the marriage anymore, the <code>UNIQUE</code> constraint should be dropped from the <code>Marriage</code>, but otherwise the tables remains the same:</p>
<table class="excel">
<caption>FictionalPerson</caption>
<tr>
<th>ID</th>
<th>Gender</th>
<th>First Name</th>
<th>Last Name</th>
</tr>
<tr>
<td>1</td>
<td>M</td>
<td>Desmond</td>
<td>Jones</td>
</tr>
<tr>
<td>2</td>
<td>F</td>
<td>Mary</td>
<td>Jones</td>
</tr>
<tr>
<td>3</td>
<td>M</td>
<td>Ashley</td>
<td>Wilkes</td>
</tr>
<tr>
<td>4</td>
<td>M</td>
<td>Rhett</td>
<td>Butler</td>
</tr>
<tr>
<td>5</td>
<td>F</td>
<td>Scarlett</td>
<td>O&#8217;Hara</td>
</tr>
<tr>
<td>6</td>
<td>M</td>
<td>Ransie</td>
<td>Bilbro</td>
</tr>
<tr>
<td>7</td>
<td>F</td>
<td>Ariela</td>
<td>Bilbro</td>
</tr>
</table>
<table class="excel">
<caption>Marriage</caption>
<tr>
<th>ID</th>
<th>Husband</th>
<th>Wife</th>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td>4</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
</table>
<p>This happened because the marriage in fact is an entity, with its own attributes like date and place of the wedding, the number of guests invited, awful toasts made, tea sets presented etc. But the ER model was not initially interested in all this and the marriage was declared as a relationship, not an entity.</p>
<p>In ER model, a relationship between two entities can either exist or not exist, one or the other. There cannot be such thing as a <q>double relationship</q> or <q>expired relationship</q>. To trace such things one should need to promote a relationship to an entity and assign the attributes to it.</p>
<p>However, the relation model can handle this easily. The way the relationship is build in our implementation, there is no clear disctinction between a marriage as a relationship and a marriage as an entity. By putting and removing <code>UNIQUE</code> constraints, <code>Marriage</code> can be easily changed to represent a one-to-one relationship, a many-to-many relationship or even an entity and two one-to-one relationships at once, the relational struсture still remaining the same.</p>
<h3>Summary</h3>
<p>The entity-relationship model defines what should be stored in a database: about what, which information, how related. This should be described in terms of entities, relationships and attributes.</p>
<p>A relational model describes how to implement the requirements of the ER model: which information should be stored in which relational tables.</p>
<p>The relational model is very flexible and can construct relations on the fly. The relations can be stored in one way and represented in another way. This helps in handling more complex models that require the relationships to be inferred rather than stored. The database can thus be made more maintainable and less prone to logical inconsistencies.</p>
<p>Since the relational model just defines relations between simple types like integers and strings, things like entities, attributes and relationships are not a concern of a relational model. It only implements the requirements established by the entity-relationship model.</p>
<p>Usually it does it using simple and formal mechanisms: a <code>PRIMARY KEY</code>-preserved table to define entities and attributes, and a separate table with <code>FOREIGN KEY</code> references to define the relationships.</p>
<p>However, it can change the storage methods: move the attributes and relationship between the tables, define the entities so that the attributes should be calculated, etc. This can be done both to improve efficiency and to get rid of inconsistencies.</p>
<p>Since relational model separates storage from presentation by using a built-in relation transformation language, <strong>SQL</strong>, any storage method is valid as long as it is able to construct the entities, attributes and relationships in their canonical form.</p>
<p>That&#8217;s why it makes no sense to speak of the entities, relationships and attributes in respect to the relational model. These things are defined by the entity-relationship model.</p>
<div class='wp_fbl_bottom' style='text-align:'><!-- Wordbooker created FB tags --> <iframe src="https://www.facebook.com/plugins/like.php?locale=en_US&amp;href=http://explainextended.com/2009/10/18/what-is-entity-relationship-model/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" style="border:none; overflow:hidden; width:250px; height:35px;" ></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/10/18/what-is-entity-relationship-model/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

