<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>EXPLAIN EXTENDED</title>
	<atom:link href="http://explainextended.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://explainextended.com</link>
	<description>How to create fast database queries</description>
	<lastBuildDate>Mon, 02 Jan 2012 00:31:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Happy New Year!</title>
		<link>http://explainextended.com/2011/12/31/happy-new-year-3/</link>
		<comments>http://explainextended.com/2011/12/31/happy-new-year-3/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 19:00:56 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5408</guid>
		<description><![CDATA[A New Year snowflake in PostgreSQL]]></description>
			<content:encoded><![CDATA[<p>This winter is anomalously warm in Europe, there is no snow and no New Year mood. So today we will be drawing a snowflake in <strong>PostgreSQL</strong>.</p>
<h3>#1. A little theory</h3>
<p>Core of a snowflake is six large symmetrical ice crystals growing from the common center. Out of these larger crystals other, smaller, crystals grow.</p>
<p>The overall shape of the snowflake is defined by how large do crystals grow and where exactly are they attached to each other.</p>
<p>These things are defined by fluctuations in air and temperature conditions around the snowflake. Because the flake itself is very small, in any given moment the conditions are nearly identical around each crystal, that&#8217;s why the offspring crystals pop up in almost same places and grow to almost same lengths. Different flakes, though, constantly move to and from each other and are subject to very different fluctuations, and that&#8217;s why they grow so unique.</p>
<p>Except for the root crystals (of which there are six), the child icicles grow in symmetrical pairs. More than that, each branch grows their own children (also in pairs), so on each step there are twice as many crystals, but they all share almost same length and angle. This gives the snowflake its symmetrical look.</p>
<p>So we can easily see that, despite the fact there may be many child crystals, the shape of a snowflake is defined by a relatively small number of parameters: how many children each crystal produces, where are they attached to it, at which angle they grow and to which length.</p>
<p>Now, let&#8217;s try to model it.<br />
<span id="more-5408"></span></p>
<h3>#2. Defining parameters</h3>
<p>To begin with, we will assume the length of the initial larger crystal as <strong>1</strong>, and its angle as <strong>0</strong>. Later we may easily scale and rotate the crystal.</p>
<p>There can be any number of child crystals (or, rather, pairs of child crystals), each pair attaching to its parent in a random place and growing at random angle. However, this places and angles, though random, are shared across the snowflake: if one pair grows here and at this angle, the similar pairs will grow on the same place and at the same angle on their parents.</p>
<p>This means that there are <code>6 * (2 ^ level)</code> twins of each crystal, where <code>level</code> is the number of its ancestors. There are 6 root crystals, 12 first-level crystals in the first first-level pairs (each growing in the same place at the same angle to the same length on root crystals), 12 first-level crystals in the second first-level pairs (again, sharing all parameters with themselves but not with the first pairs), 24 first-first grandchildren, 24 first-second grandchildren etc.</p>
<p>So, to define the snowflake shape, we need to define the number of pairs on each step, and for each pair define its place on the parent crystal, angle and length.</p>
<p>These parameters are random but have some constraints. Child crystals are shorter than their parents; they grow at sharp angles, and the number of pairs is limited. Since the parameters of each crystal depend on its parent, we would need a recursive query for that:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes
FROM    params;
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>cut</th>
<th>len</th>
<th>alpha</th>
<th>level</th>
<th>spikes</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0,895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr class="statusbar">
<td colspan="100">12 rows fetched in 0.0005s (0.0047s)</td>
</tr>
</table>
</div>
<p>Here, <code>id</code> is an array defining the <q>path</q> to the crystal pairs: <code>{0}</code> is the root branch, <code>{0, 0}</code> is the first child pair, <code>{0, 1, 0}</code> is the first child of the second child etc.</p>
<p><code>cut</code> defines the position where the crystals are attached to the parent: <code>0.356672627385706</code> in the second row means the first child will grow at <strong>35.6%</strong> of the root branches&#8217; length. </p>
<p><code>len</code> is the child length (in respect to the root branch, not immediate parent), <code>angle</code> is the angle at which they grow (in respect to the immediate parent), and <code>spikes</code> is the number of children pairs for the given crystal. <code>level</code>, I hope, is self-explanatory.</p>
<h3>#3. Defining crystal coordinates</h3>
<p>Now that we have the shape of our flake defined we need to build the actual coordinates for each crystal. To do this we again would need a recursive query.</p>
<p>First, we should make a set of branches on each level. We would start from a single root branch (it will be easy to clone it later) and generate a set of branches on each level. Each record in this set would correspond to an actual snowflake crystal:</p>
<pre class="brush: sql">
WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        )
SELECT  id::TEXT, cut, len, alpha, level, spikes, branch
FROM    tree;
</pre>
<p><a href="#" onclick="xcollapse('X0001');return false;">View query output</a><br />
</p>
<div id="X0001" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>cut</th>
<th>len</th>
<th>alpha</th>
<th>level</th>
<th>spikes</th>
<th>branch</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="int4">1</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0.895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0.285511903231964</td>
<td class="float8">0.895624824686616</td>
<td class="int4">2</td>
<td class="int4">2</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0.170363144949079</td>
<td class="float8">0.618425840960865</td>
<td class="int4">2</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0.394589512841776</td>
<td class="float8">0.81481895943341</td>
<td class="int4">2</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.667989769019186</td>
<td class="float8">0.0429738523301687</td>
<td class="float8">0.818223701208857</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.325173982884735</td>
<td class="float8">0.00710072719083308</td>
<td class="float8">0.517746497950111</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.181089712772518</td>
<td class="float8">0.0441518332901016</td>
<td class="float8">0.789388660278232</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.272290788125247</td>
<td class="float8">0.0784957594457395</td>
<td class="float8">0.522800933603073</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.559544350020587</td>
<td class="float8">0.0571536974970293</td>
<td class="float8">0.713263419194096</td>
<td class="int4">3</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.642731287050992</td>
<td class="float8">0.0842548574598808</td>
<td class="float8">0.782757860467376</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.0450216801837087</td>
<td class="float8">0.0163872951631622</td>
<td class="float8">0.590039109739568</td>
<td class="int4">3</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">0</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.289853980299085</td>
<td class="float8">0.173285180190047</td>
<td class="float8">0.896723437835692</td>
<td class="int4">3</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr class="statusbar">
<td colspan="100">39 rows fetched in 0.0015s (0.0060s)</td>
</tr>
</table>
</div>
</div>
<p>This is just a copy of the parameters recordset with each parameter duplicated as many times as the crystal level requires. There is one root branch, two copies of level 1 branches, four copies of level 2 branches etc. Each instance is defined by the parameter <code>branch</code>.</p>
<p>To calculate coordinates of each crystal we need to traverse to it from the top.</p>
<p>The coordinates of the root branch are known: they are <code>(0, 0), (1, 0)</code> (by definition).</p>
<p>To build the first pair (two crystals, each <strong>28.5%</strong> long, growing at <strong>35.6%</strong> from the beginning of the root branch at angles <strong>51&deg; 18&#8242;</strong> and <strong>&minus;51&deg; 18&#8242;</strong>, respectively) we would need to take the coordinates of the parent, find the start point (it would be <code>(X<sub>s</sub> + (X<sub>e</sub> - X<sub>s</sub>) * cut, Y<sub>s</sub> + (Y<sub>e</sub> - Y<sub>s</sub>) * cut)</code> and then the coordinates of the end point (by adding <code>len * COS(&alpha;)</code> and <code>len * SIN(&alpha;)</code> to <code>X</code> and <code>Y</code> of the start point, respectively). <code>&alpha;</code> in this formula is in respect to the coordinate grid, not to the parent, and to find it we just need to sum all angles from the ancestors.</p>
<p>This is best done with another recursive query. On each step we should find immediate children of the current parent.</p>
<p>This can be easily achieved using the following join condition: <code>child.id > parent.id AND child.id <= parend.id || spikes AND ARRAY_LENGTH(child.level, 1) = ARRAY_LENGTH(parent.level, 1) + 1</code>. This condition employs <code>PostgreSQL</code> array arithmetics: if we have <code>parent.id = {0, 2, 1}</code> with <strong>3</strong> children (<code>spikes = 2</code>), then the condition would return <code>{0, 2, 1, 0}</code>, <code>{0, 2, 1, 1}</code> and <code>{0, 2, 1, 2}</code>. This hierarchy model is called <em>materialized path</em>, and some day I will write a post about it.</p>
<p>Since the <code>id</code> alone defines the parameters, not the instance, we need to add once more condition to find the instances. Crystals <strong>0</strong> and <strong>1</strong> would have child pairs <strong>0, 1</strong> and <strong>2, 3</strong>, respectively, so we'll include <code>child.branch BETWEEN p.branch * 2 AND child.branch * 2 + 1</code> into the join condition.</p>
<p>One more thing to do is to find whether we should add positive or negative angle. It's simple: even branches are negative, odd ones are positive. </p>
<p>And here's the query:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        )
SELECT  id::TEXT, xs, ys, xe, ye, alpha, branch, level
FROM    points
</pre>
<p><a href="#" onclick="xcollapse('X0002');return false;">View query output</a><br />
</p>
<div id="X0002" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>xs</th>
<th>ys</th>
<th>xe</th>
<th>ye</th>
<th>alpha</th>
<th>branch</th>
<th>level</th>
</tr>
<tr>
<td class="text">{0}</td>
<td class="float8">0</td>
<td class="float8">0</td>
<td class="float8">1</td>
<td class="float8">0</td>
<td class="float8">0</td>
<td class="int4">0</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0</td>
<td class="float8">0.535126474998426</td>
<td class="float8">-0.222870525550944</td>
<td class="float8">-0.895624824686616</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0}</td>
<td class="float8">0.356672627385706</td>
<td class="float8">0</td>
<td class="float8">0.535126474998426</td>
<td class="float8">0.222870525550944</td>
<td class="float8">0.895624824686616</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0</td>
<td class="float8">0.700355670409353</td>
<td class="float8">-0.0987685898676881</td>
<td class="float8">-0.618425840960865</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,1}</td>
<td class="float8">0.561545127537102</td>
<td class="float8">0</td>
<td class="float8">0.700355670409353</td>
<td class="float8">0.0987685898676881</td>
<td class="float8">0.618425840960865</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0</td>
<td class="float8">0.746217182091292</td>
<td class="float8">-0.287103888547519</td>
<td class="float8">-0.81481895943341</td>
<td class="int4">0</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,2}</td>
<td class="float8">0.475528724957258</td>
<td class="float8">0</td>
<td class="float8">0.746217182091292</td>
<td class="float8">0.287103888547519</td>
<td class="float8">0.81481895943341</td>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">-0.14887523088396</td>
<td class="float8">0.469751413327762</td>
<td class="float8">-0.191410125558517</td>
<td class="float8">-1.71384852589547</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">-0.14887523088396</td>
<td class="float8">0.518723161661866</td>
<td class="float8">-0.152198135130716</td>
<td class="float8">-0.0774011234777593</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">0.14887523088396</td>
<td class="float8">0.518723161661866</td>
<td class="float8">0.152198135130716</td>
<td class="float8">0.0774011234777593</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,0}</td>
<td class="float8">0.475877971833112</td>
<td class="float8">0.14887523088396</td>
<td class="float8">0.469751413327762</td>
<td class="float8">0.191410125558517</td>
<td class="float8">1.71384852589547</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">-0.0724716964610147</td>
<td class="float8">0.415814396363913</td>
<td class="float8">-0.0794846178607699</td>
<td class="float8">-1.41337132263673</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">-0.0724716964610147</td>
<td class="float8">0.421300943231765</td>
<td class="float8">-0.0750915048806874</td>
<td class="float8">-0.377878326736505</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">0.0724716964610147</td>
<td class="float8">0.421300943231765</td>
<td class="float8">0.0750915048806874</td>
<td class="float8">0.377878326736505</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,1}</td>
<td class="float8">0.414701175775039</td>
<td class="float8">0.0724716964610147</td>
<td class="float8">0.415814396363913</td>
<td class="float8">0.0794846178607699</td>
<td class="float8">1.41337132263673</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">-0.0403595594574808</td>
<td class="float8">0.383956843885346</td>
<td class="float8">-0.0842237130189914</td>
<td class="float8">-1.68501348496485</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">-0.0403595594574808</td>
<td class="float8">0.432891699422156</td>
<td class="float8">-0.0450412628886793</td>
<td class="float8">-0.106236164408384</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">0.0403595594574808</td>
<td class="float8">0.432891699422156</td>
<td class="float8">0.0450412628886793</td>
<td class="float8">0.106236164408384</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,0,2}</td>
<td class="float8">0.388988783393044</td>
<td class="float8">0.0403595594574808</td>
<td class="float8">0.383956843885346</td>
<td class="float8">0.0842237130189914</td>
<td class="float8">1.68501348496485</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">-0.0268937771770921</td>
<td class="float8">0.632033834423018</td>
<td class="float8">-0.0982578127634654</td>
<td class="float8">-1.14122677456394</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">-0.0268937771770921</td>
<td class="float8">0.677479105058146</td>
<td class="float8">-0.0343884926052126</td>
<td class="float8">-0.0956249073577922</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">0.0268937771770921</td>
<td class="float8">0.677479105058146</td>
<td class="float8">0.0343884926052126</td>
<td class="float8">0.0956249073577922</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,0}</td>
<td class="float8">0.59934195965588</td>
<td class="float8">0.0268937771770921</td>
<td class="float8">0.632033834423018</td>
<td class="float8">0.0982578127634654</td>
<td class="float8">1.14122677456394</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">-0.0552654064199655</td>
<td class="float8">0.652751789427464</td>
<td class="float8">-0.11079307208949</td>
<td class="float8">-1.33168926015496</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">-0.0552654064199655</td>
<td class="float8">0.696112647679181</td>
<td class="float8">-0.0498532097163336</td>
<td class="float8">0.0948375782332311</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">0.0552654064199655</td>
<td class="float8">0.696112647679181</td>
<td class="float8">0.0498532097163336</td>
<td class="float8">-0.0948375782332311</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,1}</td>
<td class="float8">0.63921578252456</td>
<td class="float8">0.0552654064199655</td>
<td class="float8">0.652751789427464</td>
<td class="float8">0.11079307208949</td>
<td class="float8">1.33168926015496</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">-0.0634816628858708</td>
<td class="float8">0.664985272342964</td>
<td class="float8">-0.146527482512268</td>
<td class="float8">-1.40118370142824</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">-0.0634816628858708</td>
<td class="float8">0.733882770016532</td>
<td class="float8">-0.0496981254523029</td>
<td class="float8">0.164332019506511</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">0.0634816628858708</td>
<td class="float8">0.733882770016532</td>
<td class="float8">0.0496981254523029</td>
<td class="float8">-0.164332019506511</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,2}</td>
<td class="float8">0.650763006413631</td>
<td class="float8">0.0634816628858708</td>
<td class="float8">0.664985272342964</td>
<td class="float8">0.146527482512268</td>
<td class="float8">1.40118370142824</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">-0.00444672786521894</td>
<td class="float8">0.57360317341221</td>
<td class="float8">-0.0197700450702597</td>
<td class="float8">-1.20846495070043</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">-0.00444672786521894</td>
<td class="float8">0.584175304516377</td>
<td class="float8">-0.00491184713656396</td>
<td class="float8">-0.028386731221297</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">0.00444672786521894</td>
<td class="float8">0.584175304516377</td>
<td class="float8">0.00491184713656396</td>
<td class="float8">0.028386731221297</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,1,3}</td>
<td class="float8">0.567794611404423</td>
<td class="float8">0.00444672786521894</td>
<td class="float8">0.57360317341221</td>
<td class="float8">0.0197700450702597</td>
<td class="float8">1.20846495070043</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">-0.0832182048548433</td>
<td class="float8">0.529680086603184</td>
<td class="float8">-0.25478987388562</td>
<td class="float8">-1.7115423972691</td>
<td class="int4">0</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">-0.0832182048548433</td>
<td class="float8">0.726693128455993</td>
<td class="float8">-0.0690412356340923</td>
<td class="float8">0.0819044784022824</td>
<td class="int4">1</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">0.0832182048548433</td>
<td class="float8">0.726693128455993</td>
<td class="float8">0.0690412356340923</td>
<td class="float8">-0.0819044784022824</td>
<td class="int4">2</td>
<td class="int4">3</td>
</tr>
<tr>
<td class="text">{0,2,0}</td>
<td class="float8">0.553988851678576</td>
<td class="float8">0.0832182048548433</td>
<td class="float8">0.529680086603184</td>
<td class="float8">0.25478987388562</td>
<td class="float8">1.7115423972691</td>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
<tr class="statusbar">
<td colspan="100">39 rows fetched in 0.0017s (0.0100s)</td>
</tr>
</table>
</div>
</div>
<p>Now, we have coordinates for each crystal.</p>
<h3>#4. Visualizing</h3>
<p>The most tedious part about <strong>SQL</strong> graphics is visualizing them. To do this, we'll employ <strong>PostgreSQL</strong>'s geometrical functions.</p>
<p>Each crystal can be represented as a path between its start and end points. This can be used by constructing a line segment (<code>LSEG(POINT, POINT)</code>) using two point constructors (<code>POINT(DOUBLE PRECISION, DOUBLE PRECISION)</code>) and converting it to a path. Unfortunately, <strong>PostgreSQL</strong> does not allow direct <code>lseg</code> to <code>path</code> conversion but the latter can be easily constructed from the <code>TEXT</code> representation of an <code>lseg</code>.</p>
<p>We have six root branches so each crystal should be cloned to make six copies. It's easiest to make it by using <code>PATH * POINT</code> operator: it rotates and scales the <code>PATH</code> around <code>(0, 0)</code> so that <code>(1, 0)</code> becomes <code>POINT</code>. To construct the points, we will generate six rotation angles with step of <strong>60&deg;</strong> and will multiply the path by <code>POINT(COS(&alpha;), SIN(&alpha;))</code>. These multiplications will preserve lengths.</p>
<p>Finally we need to actually display the snowflake on the screen. To do this, we will generate a set of <strong>80 &times; 80</strong> records(<code>x</code> and <code>y</code>), defining the grid from <code>(-1, -1)</code> to <code>(1, 1)</code> with step of <code>1/40</code> units. Then we'll see if there is at least one crystal within distance of <code>1/40</code> units from each cell on the grid (using <code>POINT <-> PATH</code> distance operator and <code>EXISTS</code>). If there is, we will return a number sign (<code>#</code>) for this cell, otherwise a space.</p>
<p>Then we'll group the cells by lines (<code>y</code>), concatenate columns (using <code>ARRAY_AGG</code> and <code>ARRAY_TO_STRING</code>) and output the lines.</p>
<h3>#5. The snowflake</h3>
<p>And here's our snowflake:</p>
<pre class="brush: sql">
SELECT  SETSEED(0.20111231);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                     ##            #         #            ##                    </td>
</tr>
<tr>
<td class="text">                      #            #         #            #                     </td>
</tr>
<tr>
<td class="text">                      ##          ##         ##          ##                     </td>
</tr>
<tr>
<td class="text">                       ##         #           #         ##                      </td>
</tr>
<tr>
<td class="text">                        #         #           #         #                       </td>
</tr>
<tr>
<td class="text">                        ## ##    ##           ##    ## ##                       </td>
</tr>
<tr>
<td class="text">                         # ########           ######## #                        </td>
</tr>
<tr>
<td class="text">                        ### #### #             # #### ###                       </td>
</tr>
<tr>
<td class="text">                       ## ###### #    #   #    # ###### ##                      </td>
</tr>
<tr>
<td class="text">                       ###########  ###   ###  ###########                      </td>
</tr>
<tr>
<td class="text">                      ################     ################                     </td>
</tr>
<tr>
<td class="text">                       ####### ######       ###### #######                      </td>
</tr>
<tr>
<td class="text">               ####  ######### ##  ###     ###  ## #########  ####              </td>
</tr>
<tr>
<td class="text">                 ######  ## ## #   ##       ##   # ## ##  ######                </td>
</tr>
<tr>
<td class="text">                     ####### ###   #         #   ### #######                    </td>
</tr>
<tr>
<td class="text">                         #######  ##         ##  #######                        </td>
</tr>
<tr>
<td class="text">                       ###    ## ###         ### ##    ###                      </td>
</tr>
<tr>
<td class="text">                      ####     #####         #####     ####                     </td>
</tr>
<tr>
<td class="text">                     ######   # ###           ### #   ######                    </td>
</tr>
<tr>
<td class="text">                    ##  ##########             ##########  ##                   </td>
</tr>
<tr>
<td class="text">          #             #    #####             #####    #             #         </td>
</tr>
<tr>
<td class="text">          ##                     ##           ##                     ##         </td>
</tr>
<tr>
<td class="text">           ##      #              ##         ##              #      ##          </td>
</tr>
<tr>
<td class="text">            ##    ##              ##         ##              ##    ##           </td>
</tr>
<tr>
<td class="text">             ##   ####             ##       ##             ####   ##            </td>
</tr>
<tr>
<td class="text">              ##  ####              #       #              ####  ##             </td>
</tr>
<tr>
<td class="text">             #### ####              ##     ##              #### ####            </td>
</tr>
<tr>
<td class="text">             ## ###  ##              #     #              ##  ### ##            </td>
</tr>
<tr>
<td class="text">            #### ##   ##             ##   ##             ##   ## ####           </td>
</tr>
<tr>
<td class="text">           #########   ###            ## ##            ###   #########          </td>
</tr>
<tr>
<td class="text">          #######  ##  ###             # #             ###  ##  #######         </td>
</tr>
<tr>
<td class="text">             #####  #   ##             ###             ##   #  #####            </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">             #####  #   ##             ###             ##   #  #####            </td>
</tr>
<tr>
<td class="text">          #######  ##  ###             # #             ###  ##  #######         </td>
</tr>
<tr>
<td class="text">           #########   ###            ## ##            ###   #########          </td>
</tr>
<tr>
<td class="text">            #### ##   ##             ##   ##             ##   ## ####           </td>
</tr>
<tr>
<td class="text">             ## ###  ##              #     #              ##  ### ##            </td>
</tr>
<tr>
<td class="text">             #### ####              ##     ##              #### ####            </td>
</tr>
<tr>
<td class="text">              ##  ####              #       #              ####  ##             </td>
</tr>
<tr>
<td class="text">             ##   ####             ##       ##             ####   ##            </td>
</tr>
<tr>
<td class="text">            ##    ##              ##         ##              ##    ##           </td>
</tr>
<tr>
<td class="text">           ##      #              ##         ##              #      ##          </td>
</tr>
<tr>
<td class="text">          ##                     ##           ##                     ##         </td>
</tr>
<tr>
<td class="text">          #             #    #####             #####    #             #         </td>
</tr>
<tr>
<td class="text">                    ##  ##########             ##########  ##                   </td>
</tr>
<tr>
<td class="text">                     ######   # ###           ### #   ######                    </td>
</tr>
<tr>
<td class="text">                      ####     #####         #####     ####                     </td>
</tr>
<tr>
<td class="text">                       ###    ## ###         ### ##    ###                      </td>
</tr>
<tr>
<td class="text">                         #######  ##         ##  #######                        </td>
</tr>
<tr>
<td class="text">                     ####### ###   #         #   ### #######                    </td>
</tr>
<tr>
<td class="text">                 ######  ## ## #   ##       ##   # ## ##  ######                </td>
</tr>
<tr>
<td class="text">               ####  ######### ##  ###     ###  ## #########  ####              </td>
</tr>
<tr>
<td class="text">                       ####### ######       ###### #######                      </td>
</tr>
<tr>
<td class="text">                      ################     ################                     </td>
</tr>
<tr>
<td class="text">                       ###########  ###   ###  ###########                      </td>
</tr>
<tr>
<td class="text">                       ## ###### #    #   #    # ###### ##                      </td>
</tr>
<tr>
<td class="text">                        ### #### #             # #### ###                       </td>
</tr>
<tr>
<td class="text">                         # ########           ######## #                        </td>
</tr>
<tr>
<td class="text">                        ## ##    ##           ##    ## ##                       </td>
</tr>
<tr>
<td class="text">                        #         #           #         #                       </td>
</tr>
<tr>
<td class="text">                       ##         #           #         ##                      </td>
</tr>
<tr>
<td class="text">                      ##          ##         ##          ##                     </td>
</tr>
<tr>
<td class="text">                      #            #         #            #                     </td>
</tr>
<tr>
<td class="text">                     ##            #         #            ##                    </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0010s (2.7344s)</td>
</tr>
</table>
</div>
<h3>#6. Some more unique snowflakes</h3>
<p><a href="#" onclick="xcollapse('X0003');return false;">View query</a><br />
</p>
<div id="X0003" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112311);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                           ####                   ####                          </td>
</tr>
<tr>
<td class="text">                            ######             ######                           </td>
</tr>
<tr>
<td class="text">                            ####                 ####                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                    #       #                       #       #                   </td>
</tr>
<tr>
<td class="text">                    ##     ##                       ##     ##                   </td>
</tr>
<tr>
<td class="text">                     #     ##                       ##     #                    </td>
</tr>
<tr>
<td class="text">           #         ##    #                         #    ##         #          </td>
</tr>
<tr>
<td class="text">           ##         #   ##                         ##   #         ##          </td>
</tr>
<tr>
<td class="text">         ## ##        ##  ##                         ##  ##        ## ##        </td>
</tr>
<tr>
<td class="text">         ######        ## #                           # ##        ######        </td>
</tr>
<tr>
<td class="text">           ########     # #                           # #     ########          </td>
</tr>
<tr>
<td class="text">          ##     ###### ###                           ### ######     ##         </td>
</tr>
<tr>
<td class="text">          #          ######            ###            ######          #         </td>
</tr>
<tr>
<td class="text">                         ##            ###            ##                        </td>
</tr>
<tr>
<td class="text">                          #             #             #                         </td>
</tr>
<tr>
<td class="text">                          ##           ###           ##                         </td>
</tr>
<tr>
<td class="text">                           ##      #   ###   #      ##                          </td>
</tr>
<tr>
<td class="text">                            #      ## ##### ##      #                           </td>
</tr>
<tr>
<td class="text">                            ##    ###  ###  ###    ##                           </td>
</tr>
<tr>
<td class="text">                             #     ##### #####     #                            </td>
</tr>
<tr>
<td class="text">                             ##    ####   ####    ##                            </td>
</tr>
<tr>
<td class="text">                              #    ##### #####    #                             </td>
</tr>
<tr>
<td class="text">                              ##   ###########   ##                             </td>
</tr>
<tr>
<td class="text">                       ####    ##  ###########  ##    ####                      </td>
</tr>
<tr>
<td class="text">                  #     ####    #   #### ####   #    ####     #                 </td>
</tr>
<tr>
<td class="text">  #              ###     ####   ##  #### ####  ##   ####     ###              # </td>
</tr>
<tr>
<td class="text">  #              #######  ####   #  ## # # ##  #   ####  #######              # </td>
</tr>
<tr>
<td class="text"># #                 ######## ##  ## #  ###  # ##  ## ########                 # </td>
</tr>
<tr>
<td class="text">###                  ##  #######  ###  ###  ###  #######  ##                  ##</td>
</tr>
<tr>
<td class="text">###                  ###  ############# # #############  ###                  ##</td>
</tr>
<tr>
<td class="text">  ##                   ## ###### ############### ###### ##                   ## </td>
</tr>
<tr>
<td class="text">   ##             #### #####  ## ## # ## ## # ## ##  ##### ####             ##  </td>
</tr>
<tr>
<td class="text">    ##              #########  #  #############  #  #########              ##   </td>
</tr>
<tr>
<td class="text">     ##             #### ######### ########### ######### ####             ##    </td>
</tr>
<tr>
<td class="text">      ##              #######  ########   ########  #######              ##     </td>
</tr>
<tr>
<td class="text">       ##                ####    ####### #######    ####                ##      </td>
</tr>
<tr>
<td class="text">        ##                  ########   # #   ########                  ##       </td>
</tr>
<tr>
<td class="text">         ##                  ##  ###   ###   ###  ##                  ##        </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">         ##                  ##  ###   ###   ###  ##                  ##        </td>
</tr>
<tr>
<td class="text">        ##                  ########   # #   ########                  ##       </td>
</tr>
<tr>
<td class="text">       ##                ####    ####### #######    ####                ##      </td>
</tr>
<tr>
<td class="text">      ##              #######  ########   ########  #######              ##     </td>
</tr>
<tr>
<td class="text">     ##             #### ######### ########### ######### ####             ##    </td>
</tr>
<tr>
<td class="text">    ##              #########  #  #############  #  #########              ##   </td>
</tr>
<tr>
<td class="text">   ##             #### #####  ## ## # ## ## # ## ##  ##### ####             ##  </td>
</tr>
<tr>
<td class="text">  ##                   ## ###### ############### ###### ##                   ## </td>
</tr>
<tr>
<td class="text">###                  ###  ############# # #############  ###                  ##</td>
</tr>
<tr>
<td class="text">###                  ##  #######  ###  ###  ###  #######  ##                  ##</td>
</tr>
<tr>
<td class="text"># #                 ######## ##  ## #  ###  # ##  ## ########                 # </td>
</tr>
<tr>
<td class="text">  #              #######  ####   #  ## # # ##  #   ####  #######              # </td>
</tr>
<tr>
<td class="text">  #              ###     ####   ##  #### ####  ##   ####     ###              # </td>
</tr>
<tr>
<td class="text">                  #     ####    #   #### ####   #    ####     #                 </td>
</tr>
<tr>
<td class="text">                       ####    ##  ###########  ##    ####                      </td>
</tr>
<tr>
<td class="text">                              ##   ###########   ##                             </td>
</tr>
<tr>
<td class="text">                              #    ##### #####    #                             </td>
</tr>
<tr>
<td class="text">                             ##    ####   ####    ##                            </td>
</tr>
<tr>
<td class="text">                             #     ##### #####     #                            </td>
</tr>
<tr>
<td class="text">                            ##    ###  ###  ###    ##                           </td>
</tr>
<tr>
<td class="text">                            #      ## ##### ##      #                           </td>
</tr>
<tr>
<td class="text">                           ##      #   ###   #      ##                          </td>
</tr>
<tr>
<td class="text">                          ##           ###           ##                         </td>
</tr>
<tr>
<td class="text">                          #             #             #                         </td>
</tr>
<tr>
<td class="text">                         ##            ###            ##                        </td>
</tr>
<tr>
<td class="text">          #          ######            ###            ######          #         </td>
</tr>
<tr>
<td class="text">          ##     ###### ###                           ### ######     ##         </td>
</tr>
<tr>
<td class="text">           ########     # #                           # #     ########          </td>
</tr>
<tr>
<td class="text">         ######        ## #                           # ##        ######        </td>
</tr>
<tr>
<td class="text">         ## ##        ##  ##                         ##  ##        ## ##        </td>
</tr>
<tr>
<td class="text">           ##         #   ##                         ##   #         ##          </td>
</tr>
<tr>
<td class="text">           #         ##    #                         #    ##         #          </td>
</tr>
<tr>
<td class="text">                     #     ##                       ##     #                    </td>
</tr>
<tr>
<td class="text">                    ##     ##                       ##     ##                   </td>
</tr>
<tr>
<td class="text">                    #       #                       #       #                   </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            ####                 ####                           </td>
</tr>
<tr>
<td class="text">                            ######             ######                           </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0010s (2.0781s)</td>
</tr>
</table>
</div>
<p><a href="#" onclick="xcollapse('X0004');return false;">View query</a><br />
</p>
<div id="X0004" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112312);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">        ## ##        ##                                   ##        ## ##       </td>
</tr>
<tr>
<td class="text">         #####       ##                                   ##       #####        </td>
</tr>
<tr>
<td class="text">          ####        #                                   #        ####         </td>
</tr>
<tr>
<td class="text">       ########       #                                   #       ########      </td>
</tr>
<tr>
<td class="text">    ##### ##  ##      #                                   #      ##  ## #####   </td>
</tr>
<tr>
<td class="text">    #          ###  # #                                   # #  ###          #   </td>
</tr>
<tr>
<td class="text">                 ## ###                                   ### ##                </td>
</tr>
<tr>
<td class="text">                  #####                                   #####                 </td>
</tr>
<tr>
<td class="text">                    ###                                   ###                   </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                      ##                                 ##                     </td>
</tr>
<tr>
<td class="text">                       ##                               ##                      </td>
</tr>
<tr>
<td class="text">                        #                               #                       </td>
</tr>
<tr>
<td class="text">                        ##     #    #       #    #     ##                       </td>
</tr>
<tr>
<td class="text">                         #   # ###  #   #   #  ### #   #                        </td>
</tr>
<tr>
<td class="text">                         ##  ###### #  ###  # ######  ##                        </td>
</tr>
<tr>
<td class="text">                          #   ### # #### #### # ###   #                         </td>
</tr>
<tr>
<td class="text">                          ##   ## ####     #### ##   ##                         </td>
</tr>
<tr>
<td class="text">                       ##  ## ###  ##       ##  ### ##  ##                      </td>
</tr>
<tr>
<td class="text">                      ###   #  #   ##       ##   #  #   ###                     </td>
</tr>
<tr>
<td class="text">                     #####  ## #   ##       ##   # ##  #####                    </td>
</tr>
<tr>
<td class="text">                     ######  # ##  #         #  ## #  ######                    </td>
</tr>
<tr>
<td class="text">                      #  ### ####  #         #  #### ###  #                     </td>
</tr>
<tr>
<td class="text">                  ##  #    ####### #         # #######    #  ##                 </td>
</tr>
<tr>
<td class="text">                   #####     ##### ####   #### #####     #####                  </td>
</tr>
<tr>
<td class="text">                     ####    ########       ########    ####                    </td>
</tr>
<tr>
<td class="text">                    #######  ## # ##         ## # ##  #######                   </td>
</tr>
<tr>
<td class="text">                  ###     ##### ####         #### #####     ###                 </td>
</tr>
<tr>
<td class="text">                  #          ### ###         ### ###          #                 </td>
</tr>
<tr>
<td class="text">                  #        ########           ########        #                 </td>
</tr>
<tr>
<td class="text">               ## #        ##    ###         ###    ##        # ##              </td>
</tr>
<tr>
<td class="text">                ####              #####   #####              ####               </td>
</tr>
<tr>
<td class="text">                 ###              #####   #####              ###                </td>
</tr>
<tr>
<td class="text">             ########             #####   #####             ########            </td>
</tr>
<tr>
<td class="text">             ###  # ##  ##        ####     ####        ##  ## #  ###            </td>
</tr>
<tr>
<td class="text">             ###     ##  #           #     #           #  ##     ###            </td>
</tr>
<tr>
<td class="text">             #####    ####           ##   ##           ####    #####            </td>
</tr>
<tr>
<td class="text">             ######     ##            ## ##            ##     ######            </td>
</tr>
<tr>
<td class="text">##                #########    ##      # #      ##    #########                #</td>
</tr>
<tr>
<td class="text"> ###                ###   ### ####     ###     #### ###   ###                ###</td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text"> ###                ###   ### ####     ###     #### ###   ###                ###</td>
</tr>
<tr>
<td class="text">##                #########    ##      # #      ##    #########                #</td>
</tr>
<tr>
<td class="text">             ######     ##            ## ##            ##     ######            </td>
</tr>
<tr>
<td class="text">             #####    ####           ##   ##           ####    #####            </td>
</tr>
<tr>
<td class="text">             ###     ##  #           #     #           #  ##     ###            </td>
</tr>
<tr>
<td class="text">             ###  # ##  ##        ####     ####        ##  ## #  ###            </td>
</tr>
<tr>
<td class="text">             ########             #####   #####             ########            </td>
</tr>
<tr>
<td class="text">                 ###              #####   #####              ###                </td>
</tr>
<tr>
<td class="text">                ####              #####   #####              ####               </td>
</tr>
<tr>
<td class="text">               ## #        ##    ###         ###    ##        # ##              </td>
</tr>
<tr>
<td class="text">                  #        ########           ########        #                 </td>
</tr>
<tr>
<td class="text">                  #          ### ###         ### ###          #                 </td>
</tr>
<tr>
<td class="text">                  ###     ##### ####         #### #####     ###                 </td>
</tr>
<tr>
<td class="text">                    #######  ## # ##         ## # ##  #######                   </td>
</tr>
<tr>
<td class="text">                     ####    ########       ########    ####                    </td>
</tr>
<tr>
<td class="text">                   #####     ##### ####   #### #####     #####                  </td>
</tr>
<tr>
<td class="text">                  ##  #    ####### #         # #######    #  ##                 </td>
</tr>
<tr>
<td class="text">                      #  ### ####  #         #  #### ###  #                     </td>
</tr>
<tr>
<td class="text">                     ######  # ##  #         #  ## #  ######                    </td>
</tr>
<tr>
<td class="text">                     #####  ## #   ##       ##   # ##  #####                    </td>
</tr>
<tr>
<td class="text">                      ###   #  #   ##       ##   #  #   ###                     </td>
</tr>
<tr>
<td class="text">                       ##  ## ###  ##       ##  ### ##  ##                      </td>
</tr>
<tr>
<td class="text">                          ##   ## ####     #### ##   ##                         </td>
</tr>
<tr>
<td class="text">                          #   ### # #### #### # ###   #                         </td>
</tr>
<tr>
<td class="text">                         ##  ###### #  ###  # ######  ##                        </td>
</tr>
<tr>
<td class="text">                         #   # ###  #   #   #  ### #   #                        </td>
</tr>
<tr>
<td class="text">                        ##     #    #       #    #     ##                       </td>
</tr>
<tr>
<td class="text">                        #                               #                       </td>
</tr>
<tr>
<td class="text">                       ##                               ##                      </td>
</tr>
<tr>
<td class="text">                      ##                                 ##                     </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                    ###                                   ###                   </td>
</tr>
<tr>
<td class="text">                  #####                                   #####                 </td>
</tr>
<tr>
<td class="text">                 ## ###                                   ### ##                </td>
</tr>
<tr>
<td class="text">    #          ###  # #                                   # #  ###          #   </td>
</tr>
<tr>
<td class="text">    ##### ##  ##      #                                   #      ##  ## #####   </td>
</tr>
<tr>
<td class="text">       ########       #                                   #       ########      </td>
</tr>
<tr>
<td class="text">          ####        #                                   #        ####         </td>
</tr>
<tr>
<td class="text">         #####       ##                                   ##       #####        </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0009s (4.4531s)</td>
</tr>
</table>
</div>
<p><a href="#" onclick="xcollapse('X0005');return false;">View query</a><br />
</p>
<div id="X0005" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  SETSEED(0.201112313);

WITH    RECURSIVE
        params (id, cut, len, alpha, level, spikes) AS
        (
        SELECT  ARRAY[0::INT], 0::DOUBLE PRECISION, 1::DOUBLE PRECISION, 0::DOUBLE PRECISION, 1, FLOOR(RANDOM() * 4)::INT AS spikes
        UNION ALL
        SELECT  id, cut, len, alpha, level, FLOOR(RANDOM() * 4)::INT AS spikes
        FROM    (
                SELECT  id || generate_series(0, spikes) AS id,
                        RANDOM() AS cut, len * RANDOM() * 0.5 AS len, (PI() / 2) * (0.3 + RANDOM() * 0.3) AS alpha, level + 1 AS level
                FROM    params
                ) q
        WHERE   level &lt;= 3
        ),
        tree AS
        (
        SELECT  *, generate_series(0, (1 &lt;&lt; (level - 1)) - 1) AS branch
        FROM    params
        ),
        points AS
        (
        SELECT  id,
                0::double precision AS xs,
                0::double precision AS ys,
                len AS xe,
                0::double precision AS ye,
                0::double precision AS alpha,
                branch,
                level,
                spikes
        FROM    tree
        WHERE   id = ARRAY[0]
        UNION ALL
        SELECT  id,
                xs + (xe - xs) * cut AS xs,
                ys + (ye - ys) * cut AS ys,
                xs + (xe - xs) * cut + len * COS(alpha) AS xe,
                ys + (ye - ys) * cut + len * SIN(alpha) AS ye,
                alpha,
                branch,
                level,
                spikes
        FROM    (
                SELECT  t.id, p.id || t.id AS uid, xs, xe, ys, ye, cut, len,
                        p.alpha + t.alpha * ((t.branch % 2) * 2 - 1) AS alpha,
                        t.branch, t.level, t.spikes
                FROM    points p
                JOIN    tree t
                ON      t.id &gt; p.id
                        AND t.id &lt;= (p.id || p.spikes)
                        AND ARRAY_LENGTH(t.id, 1) = p.level + 1
                        AND t.branch BETWEEN p.branch * 2 AND p.branch * 2 + 1
                ) q
        ),
        lines AS
        (
        SELECT  PATH(LSEG(POINT(xs, ys), POINT(xe, ye))::TEXT) * POINT(COS(RADIANS(turn)), SIN(RADIANS(turn))) AS line,
                *
        FROM    points
        CROSS JOIN
                generate_series(0, 300, 60) AS turn
        )
SELECT  ARRAY_TO_STRING
                (
                ARRAY_AGG
                        (
                        CASE
                                EXISTS
                                (
                                SELECT  NULL
                                FROM    lines
                                WHERE   POINT(x::DOUBLE PRECISION / scale, y::DOUBLE PRECISION / scale) &lt;-&gt; line &lt;= (0.7 / scale)
                                )
                        WHEN    TRUE THEN
                                &#039;#&#039;
                        ELSE    &#039; &#039;
                        END
                        ORDER BY
                                x
                        ),
                &#039;&#039;
                )
FROM    (
        SELECT  *,
                generate_series(-scale, scale - 1) AS y
        FROM    (
                SELECT  scale, generate_series(-scale, scale - 1) AS x
                FROM    (
                        VALUES
                        (40)
                        ) q (scale)
                ) q
        ) q
GROUP BY
        y
ORDER BY
        y
</pre>
</div>
<div class="terminal widefont smallfont">
<table class="terminal">
<tr>
<th>array_to_string</th>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                     ##                                   ##                    </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                      ## #                             # ##                     </td>
</tr>
<tr>
<td class="text">                       ###                             ###                      </td>
</tr>
<tr>
<td class="text">                     ######                           ######                    </td>
</tr>
<tr>
<td class="text">                      ####                             ####                     </td>
</tr>
<tr>
<td class="text">                        ##                             ##                       </td>
</tr>
<tr>
<td class="text">                         ##                           ##                        </td>
</tr>
<tr>
<td class="text">                          #                           #                         </td>
</tr>
<tr>
<td class="text">                          ##                         ##                         </td>
</tr>
<tr>
<td class="text">                           ##                       ##                          </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                             #                     #                            </td>
</tr>
<tr>
<td class="text">                             ##                   ##                            </td>
</tr>
<tr>
<td class="text">                              #                   #                             </td>
</tr>
<tr>
<td class="text">                              ##                 ##                             </td>
</tr>
<tr>
<td class="text">                               ##               ##                              </td>
</tr>
<tr>
<td class="text">                                #               #                               </td>
</tr>
<tr>
<td class="text">                                ##             ##                               </td>
</tr>
<tr>
<td class="text">                                 #             #                                </td>
</tr>
<tr>
<td class="text">                                 ##           ##                                </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                   ##       ##                                  </td>
</tr>
<tr>
<td class="text">                                    #       #                                   </td>
</tr>
<tr>
<td class="text">                                    ##     ##                                   </td>
</tr>
<tr>
<td class="text">                                     #     #                                    </td>
</tr>
<tr>
<td class="text">                                     ##   ##                                    </td>
</tr>
<tr>
<td class="text">                                      ## ##                                     </td>
</tr>
<tr>
<td class="text">      ###                              # #                              ###     </td>
</tr>
<tr>
<td class="text">       ###                             ###                             ###      </td>
</tr>
<tr>
<td class="text">################################################################################</td>
</tr>
<tr>
<td class="text">       ###                             ###                             ###      </td>
</tr>
<tr>
<td class="text">      ###                              # #                              ###     </td>
</tr>
<tr>
<td class="text">                                      ## ##                                     </td>
</tr>
<tr>
<td class="text">                                     ##   ##                                    </td>
</tr>
<tr>
<td class="text">                                     #     #                                    </td>
</tr>
<tr>
<td class="text">                                    ##     ##                                   </td>
</tr>
<tr>
<td class="text">                                    #       #                                   </td>
</tr>
<tr>
<td class="text">                                   ##       ##                                  </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                  ##         ##                                 </td>
</tr>
<tr>
<td class="text">                                 ##           ##                                </td>
</tr>
<tr>
<td class="text">                                 #             #                                </td>
</tr>
<tr>
<td class="text">                                ##             ##                               </td>
</tr>
<tr>
<td class="text">                                #               #                               </td>
</tr>
<tr>
<td class="text">                               ##               ##                              </td>
</tr>
<tr>
<td class="text">                              ##                 ##                             </td>
</tr>
<tr>
<td class="text">                              #                   #                             </td>
</tr>
<tr>
<td class="text">                             ##                   ##                            </td>
</tr>
<tr>
<td class="text">                             #                     #                            </td>
</tr>
<tr>
<td class="text">                            ##                     ##                           </td>
</tr>
<tr>
<td class="text">                            #                       #                           </td>
</tr>
<tr>
<td class="text">                           ##                       ##                          </td>
</tr>
<tr>
<td class="text">                          ##                         ##                         </td>
</tr>
<tr>
<td class="text">                          #                           #                         </td>
</tr>
<tr>
<td class="text">                         ##                           ##                        </td>
</tr>
<tr>
<td class="text">                        ##                             ##                       </td>
</tr>
<tr>
<td class="text">                      ####                             ####                     </td>
</tr>
<tr>
<td class="text">                     ######                           ######                    </td>
</tr>
<tr>
<td class="text">                       ###                             ###                      </td>
</tr>
<tr>
<td class="text">                      ## #                             # ##                     </td>
</tr>
<tr>
<td class="text">                      #                                   #                     </td>
</tr>
<tr>
<td class="text">                     ##                                   ##                    </td>
</tr>
<tr>
<td class="text">                     #                                     #                    </td>
</tr>
<tr>
<td class="text">                    ##                                     ##                   </td>
</tr>
<tr>
<td class="text">                    #                                       #                   </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr>
<td class="text">                                                                                </td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0009s (1.2188s)</td>
</tr>
</table>
</div>
<div class="plainnote" style="text-align: center">
<big><strong>Happy New Year!</strong></big>
</div>
<p>Previous New Year posts:</p>
<ul>
<li><a href="/2009/12/31/happy-new-year/">Happy New 2010 Year!</a></li>
<li><a href="/2010/12/31/happy-new-year/">Happy New 2011 Year!</a></li>
</ul>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/12/31/happy-new-year-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What&#8217;s UNPIVOT good for?</title>
		<link>http://explainextended.com/2011/06/30/whats-unpivot-good-for/</link>
		<comments>http://explainextended.com/2011/06/30/whats-unpivot-good-for/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 19:00:56 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5373</guid>
		<description><![CDATA[A practical use for UNPIVOT]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Karen</strong> asks:</p>
<blockquote>
<p>… I&#8217;ve always thought <code>PIVOT</code> and <code>UNPIVOT</code> are signs of a poorly designed database. I mean, is there a legitimate use for them if your model is OK?</p>
</blockquote>
<p>I&#8217;ve made an actual use for them in a project I&#8217;ve been working on for the last several months (which is partly why there were no updates for so long!)</p>
<p>Part of the project is a task management system where each task has several persons related to it. There can be the creator of the task, the person the task is assigned to, the actual author of the task (on behalf of whom the task is created), and the task can be possible completed by a person or deleted by a person. A total of 5 fields related to persons.</p>
<p>Now, we need to take all tasks within a certain time range and list all people involved in them.</p>
<p>Let&#8217;s create a sample table and see how would we do that.</p>
<p><span id="more-5373"></span></p>
<p><a href="#" onclick="xcollapse('X1598');return false;">Table creation details</a><br />
</p>
<div id="X1598" style="display: none; background: transparent;">
<pre class="brush: sql">
SET NOCOUNT ON
GO

DROP TABLE
        [20110630_unpivot].task

DROP SCHEMA
        [20110630_unpivot]
GO

CREATE SCHEMA
        [20110630_unpivot]

CREATE TABLE
        [20110630_unpivot].Task
        (
        id INT NOT NULL PRIMARY KEY IDENTITY,
        ts DATETIME NOT NULL,
        createdBy INT NOT NULL,
        assignedTo INT NOT NULL,
        onBehalfOf INT NOT NULL,
        completedBy INT,
        deletedBy INT,
        stuffing CHAR(1000) NOT NULL DEFAULT &#039;&#039;
        )
GO

CREATE INDEX
        IX_Task_Ts
ON      [20110630_unpivot].Task(ts)
GO

BEGIN TRANSACTION

SELECT  RAND(20110630)

DECLARE @cnt INT

SET @cnt = 0

WHILE @cnt &lt; 1000000
BEGIN
        INSERT
        INTO    [20110630_unpivot].Task
                (
                ts, createdBy, assignedTo, onBehalfOf, completedBy, deletedBy
                )
        VALUES  (
                DATEADD(s, -@cnt, DATEADD(d, 1, &#039;2011-06-30&#039;)),
                RAND() * 100000,
                RAND() * 100000,
                RAND() * 100000,
                RAND() * 100000,
                RAND() * 100000
                )
        SET @cnt = @cnt + 1
END

COMMIT

GO
</pre>
</div>
<p>There are 5 persons fields, timestamp and stuffing. The timestamp field is indexed.</p>
<p>Now, let&#8217;s find all people involved in the tasks between <strong>2011-06-30</strong> and <strong>2011-06-30 04:00:00</strong>. To do this, we could just use 5 queries (each selecting one of the persons) and <code>UNION</code> them:</p>
<pre class="brush: sql">
SELECT  SUM(CAST(id AS BIGINT)), COUNT(*)
FROM    (
        SELECT  createdBy AS id
        FROM    [20110630_unpivot].Task
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
        UNION
        SELECT  assignedTo AS id
        FROM    [20110630_unpivot].Task
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
        UNION
        SELECT  onBehalfOf AS id
        FROM    [20110630_unpivot].Task
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
        UNION
        SELECT  completedBy AS id
        FROM    [20110630_unpivot].Task
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
                AND completedBy IS NOT NULL
        UNION
        SELECT  deletedBy AS id
        FROM    [20110630_unpivot].Task
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
                AND deletedBy IS NOT NULL
        ) q
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th></th>
<th></th>
</tr>
<tr>
<td class="bigint">2573160101</td>
<td class="int">51331</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0009s (0.4251s)</td>
</tr>
</table>
</div>
<pre>
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Task'. Scan count 15, logical reads 220765, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]
 SQL Server Execution Times:
   CPU time = 484 ms,  elapsed time = 410 ms.
</pre>
<pre>
  |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [globalagg1019]=(0) THEN NULL ELSE [globalagg1021] END, [Expr1017]=CONVERT_IMPLICIT(int,[globalagg1023],0)))
       |--Stream Aggregate(DEFINE:([globalagg1019]=SUM([partialagg1018]), [globalagg1021]=SUM([partialagg1020]), [globalagg1023]=SUM([partialagg1022])))
            |--Parallelism(Gather Streams)
                 |--Compute Scalar(DEFINE:([partialagg1022]=[partialagg1018]))
                      |--Stream Aggregate(DEFINE:([partialagg1018]=Count(*), [partialagg1020]=SUM(CONVERT(bigint,[Union1015],0))))
                           |--Hash Match(Aggregate, HASH:([Union1015]), RESIDUAL:([Union1015] = [Union1015]))
                                |--Concatenation
                                     |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([ee].[20110630_unpivot].[Task].[createdBy]))
                                     |    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1025]) OPTIMIZED WITH UNORDERED PREFETCH)
                                     |         |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                     |         |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F0B3E4B07]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]) LOOKUP ORDERED FORWARD)
                                     |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([ee].[20110630_unpivot].[Task].[assignedTo]))
                                     |    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1026]) OPTIMIZED WITH UNORDERED PREFETCH)
                                     |         |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                     |         |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F0B3E4B07]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]) LOOKUP ORDERED FORWARD)
                                     |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([ee].[20110630_unpivot].[Task].[onBehalfOf]))
                                     |    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1027]) OPTIMIZED WITH UNORDERED PREFETCH)
                                     |         |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                     |         |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F0B3E4B07]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]) LOOKUP ORDERED FORWARD)
                                     |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([ee].[20110630_unpivot].[Task].[completedBy]))
                                     |    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1028]) OPTIMIZED WITH UNORDERED PREFETCH)
                                     |         |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                     |         |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F0B3E4B07]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]),  WHERE:([ee].[20110630_unpivot].[Task].[completedBy] IS NOT NULL) LOOKUP ORDERED FORWARD)
                                     |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([ee].[20110630_unpivot].[Task].[deletedBy]))
                                          |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1029]) OPTIMIZED WITH UNORDERED PREFETCH)
                                               |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                               |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F0B3E4B07]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]),  WHERE:([ee].[20110630_unpivot].[Task].[deletedBy] IS NOT NULL) LOOKUP ORDERED FORWARD)
</pre>
<p>Now we have <strong>51331</strong> persons returned in <strong>410 ms</strong>, which required <strong>220765</strong> logical reads.</p>
<p>As we can see in the plan, the table is scanned 5 times (once for each query). This is not very efficient of course. It would be much better if each matching record in the table would be only visited once.</p>
<p>This is where <code>UNPIVOT</code> comes into play.</p>
<p>As its name suggests, it does the reverse of <code>PIVOT</code>, that is moves data from multiple columns to multiple columns. And this is exactly what we need in our case.</p>
<p>Let&#8217;s try it:</p>
<pre class="brush: sql">
SELECT  SUM(CAST(personId AS BIGINT)), COUNT(*)
FROM    (
        SELECT  DISTINCT personId
        FROM    [20110630_unpivot].Task
        UNPIVOT
                (
                personId FOR personType IN
                (createdBy, assignedTo, onBehalfOf, completedBy, deletedBy)
                ) p
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
        ) q
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th></th>
<th></th>
</tr>
<tr>
<td class="bigint">2573160101</td>
<td class="int">51331</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0002s (0.1802s)</td>
</tr>
</table>
</div>
<pre>
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Task'. Scan count 3, logical reads 44153, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]
 SQL Server Execution Times:
   CPU time = 165 ms,  elapsed time = 180 ms.
</pre>
<pre>
  |--Compute Scalar(DEFINE:([Expr1013]=CASE WHEN [globalagg1016]=(0) THEN NULL ELSE [globalagg1018] END, [Expr1014]=CONVERT_IMPLICIT(int,[globalagg1020],0)))
       |--Stream Aggregate(DEFINE:([globalagg1016]=SUM([partialagg1015]), [globalagg1018]=SUM([partialagg1017]), [globalagg1020]=SUM([partialagg1019])))
            |--Parallelism(Gather Streams)
                 |--Compute Scalar(DEFINE:([partialagg1019]=[partialagg1015]))
                      |--Stream Aggregate(DEFINE:([partialagg1015]=Count(*), [partialagg1017]=SUM(CONVERT(bigint,[Expr1011],0))))
                           |--Sort(DISTINCT ORDER BY:([Expr1011] ASC))
                                |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Expr1011]))
                                     |--Hash Match(Partial Aggregate, HASH:([Expr1011]), RESIDUAL:([Expr1011] = [Expr1011]))
                                          |--Filter(WHERE:([Expr1011] IS NOT NULL))
                                               |--Nested Loops(Left Outer Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[createdBy], [ee].[20110630_unpivot].[Task].[assignedTo], [ee].[20110630_unpivot].[Task].[onBehalfOf], [ee].[20110630_unpivot].[Task].[completedBy], [ee].[20110630_unpivot].[Task].[deletedBy]))
                                                    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1022]) OPTIMIZED WITH UNORDERED PREFETCH)
                                                    |    |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                                    |    |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F14C7B541]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]) LOOKUP ORDERED FORWARD)
                                                    |--Constant Scan(VALUES:(([ee].[20110630_unpivot].[Task].[createdBy]),([ee].[20110630_unpivot].[Task].[assignedTo]),([ee].[20110630_unpivot].[Task].[onBehalfOf]),([ee].[20110630_unpivot].[Task].[completedBy]),([ee].[20110630_unpivot].[Task].[deletedBy])))
</pre>
<p>As we can see, this is more than twice as efficient as a <code>UNION</code> and only takes <strong>44153</strong> page reads to complete.</p>
<p>How does it work internally?</p>
<p>We see in the plan that there is a nested loop between a result of a clustered index seek and something called <code>Constant Scan</code>. The constant scan returns 5 values in each loop and those are the fields listed in the <code>UNPIVOT</code> clause. It just takes each record and outputs fields found there, without rereading the record. This is actually what we wanted.</p>
<p>This behavior can be made more clear if we rewrite the query a little:</p>
<pre class="brush: sql">
SELECT  SUM(CAST(personId AS BIGINT)), COUNT(*)
FROM    (
        SELECT  DISTINCT personId
        FROM    [20110630_unpivot].Task
        CROSS APPLY
                (
                SELECT  createdBy AS personId
                UNION ALL
                SELECT  assignedTo AS personId
                UNION ALL
                SELECT  onBehalfOf AS personId
                UNION ALL
                SELECT  completedBy AS personId
                UNION ALL
                SELECT  deletedBy AS personId
                ) p
        WHERE   ts &gt;= &#039;2011-06-30&#039;
                AND ts &lt; &#039;2011-06-30 04:00:00&#039;
                AND personId IS NOT NULL
        ) q
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th></th>
<th></th>
</tr>
<tr>
<td class="bigint">2573160101</td>
<td class="int">51331</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0002s (0.1766s)</td>
</tr>
</table>
</div>
<pre>
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Task'. Scan count 3, logical reads 44153, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Microsoft][SQL Server Native Client 10.0][SQL Server]
 SQL Server Execution Times:
   CPU time = 149 ms,  elapsed time = 176 ms.
</pre>
<pre>
  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [globalagg1007]=(0) THEN NULL ELSE [globalagg1009] END, [Expr1005]=CONVERT_IMPLICIT(int,[globalagg1011],0)))
       |--Stream Aggregate(DEFINE:([globalagg1007]=SUM([partialagg1006]), [globalagg1009]=SUM([partialagg1008]), [globalagg1011]=SUM([partialagg1010])))
            |--Parallelism(Gather Streams)
                 |--Compute Scalar(DEFINE:([partialagg1010]=[partialagg1006]))
                      |--Stream Aggregate(DEFINE:([partialagg1006]=Count(*), [partialagg1008]=SUM(CONVERT(bigint,[Union1003],0))))
                           |--Sort(DISTINCT ORDER BY:([Union1003] ASC))
                                |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Union1003]))
                                     |--Hash Match(Partial Aggregate, HASH:([Union1003]), RESIDUAL:([Union1003] = [Union1003]))
                                          |--Filter(WHERE:([Union1003] IS NOT NULL))
                                               |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[createdBy], [ee].[20110630_unpivot].[Task].[assignedTo], [ee].[20110630_unpivot].[Task].[onBehalfOf], [ee].[20110630_unpivot].[Task].[completedBy], [ee].[20110630_unpivot].[Task].[deletedBy]))
                                                    |--Nested Loops(Inner Join, OUTER REFERENCES:([ee].[20110630_unpivot].[Task].[id], [Expr1013]) OPTIMIZED WITH UNORDERED PREFETCH)
                                                    |    |--Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[IX_Task_Ts]), SEEK:([ee].[20110630_unpivot].[Task].[ts] &gt;= &#39;2011-06-30 00:00:00.000&#39; AND [ee].[20110630_unpivot].[Task].[ts] &lt; &#39;2011-06-30 04:00:00.000&#39;) ORDERED FORWARD)
                                                    |    |--Clustered Index Seek(OBJECT:([ee].[20110630_unpivot].[Task].[PK__Task__3213E83F14C7B541]), SEEK:([ee].[20110630_unpivot].[Task].[id]=[ee].[20110630_unpivot].[Task].[id]) LOOKUP ORDERED FORWARD)
                                                    |--Constant Scan(VALUES:(([ee].[20110630_unpivot].[Task].[createdBy]),([ee].[20110630_unpivot].[Task].[assignedTo]),([ee].[20110630_unpivot].[Task].[onBehalfOf]),([ee].[20110630_unpivot].[Task].[completedBy]),([ee].[20110630_unpivot].[Task].[deletedBy])))
</pre>
<p>Here, we just take each record and explode it into 5 records using <code>CROSS APPLY</code>.</p>
<p>This yields exactly same plan, exactly same <strong>I/O</strong> and exactly same output. In fact, that&#8217;s exactly what the <code>UNPIVOT</code> query does.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/06/30/whats-unpivot-good-for/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Shared Plan and Algorithm Network Cache (SPANC)</title>
		<link>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/</link>
		<comments>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/#comments</comments>
		<pubDate>Fri, 01 Apr 2011 19:00:49 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5314</guid>
		<description><![CDATA[Analysis of the various RDBMS has proven that the systems since recently do exchange the query plans through a distributed storage engine called Shared Plan and Algorithm Network Cache (SPANC) and use StackOverflow as an external optimization engine as well.]]></description>
			<content:encoded><![CDATA[<p>Due to the nature of my work I have to deal with various database systems.</p>
<p>While <strong>SQL </strong> is more or less standardized, the optimizers are implemented differently in the different systems. Some systems cannot join tables with anything other than nested loops, the other can only <code>GROUP BY</code> using a sort, etc.</p>
<p>So when you write a join in, say, <strong>MySQL</strong>, you cannot expect it to be a sort merge join (and you should consider this fact when designing the query). Or, when you write a <code>DISTINCT</code> in <strong>SQL Server</strong>, you can&#8217;t expect a loose index scan. These are limitations put by their optimizers.</p>
<p>However, in the last three months I noticed a great improvement in the queries where I could not expect any.</p>
<p>It started when I tried to debug this in <strong>SQL Server</strong>:</p>
<pre class="brush: sql">
SELECT  DISTINCT order
FROM    orderItem
</pre>
<p>while yielded this plan:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2011/04/sql-server-query-plan.png" alt="" title="SQL Server SPANC query plan" class="aligncenter size-full wp-image-5346 noborder" /></p>
<p>Similar results were obtained on <strong>Oracle</strong>:</p>
<pre>
Plan hash value: 1345318323

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                                         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                                  |             |       |   200 |     2  (50)| 00:00:01 |
|   1 |  REMOTE SPANC QUERY (SQLSERVER, MYSQL, POSTGRESQL, STACKOVERFLOW) |             |       |   200 |     2  (50)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------------------
</pre>
<p>, <strong>MySQL</strong>:</p>
<pre>
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra                                               |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
|  1 | SIMPLE      | orderItem | spanc | NULL          | ALL     | NULL    | NULL |         | Using Oracle, PostgreSQL, SQL Server, StackOverflow |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-----------------------------------------------------+
</pre>
<p>and <strong>PostgreSQL</strong>:</p>
<pre>
Seq Scan on OrderItem  (cost=0.00..6.44 width=4)
 -> Remote Scan on SPANC (Oracle, MySQL, SQL Server, StackOverflow)   (cost=0.00..100.00 width=4)
</pre>
<p>Network analysis has shown weird encrypted activity between the servers in my internal network which host <strong>SQL Server</strong>, <strong>Oracle</strong>, <strong>PostgreSQL</strong> and <strong>MySQL</strong> servers.</p>
<p>Ultimately, there was unencrypted activity outside of the internal network which turned out to be an HTTP <code>POST</code> request followed by several <code>GET</code> polls to <a href="http://stackoverflow.com/questions/5518080/distinct-optimization">http://stackoverflow.com/questions/5518080/distinct-optimization</a>.</p>
<p>It seems that the developers of major database systems agreed to share the knowledge about the most efficient query plans in some kind of a distributed storage (which probably is called <strong>SPANC</strong> as we can see in the query plans) and provide an interface to access each other&#8217;s systems.</p>
<p>It also seems that these systems treat <a href="http://stackoverflow.com"><strong>Stack Overflow</strong></a> as an external optimization engine where the most experienced developers can build their plans for them in a most efficient way.</p>
<p>I would be glad to have further clarification from the companies staff.</p>
<p>This also begs a question: how many of regular <strong>Stack Overflow</strong> participants are in fact query engines disguised as curious fellow developers?</p>
<p>It would be definitely nice to know.</p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/04/01/shared-plan-and-algorithm-network-cache-spanc/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>MySQL: GROUP BY in UNION</title>
		<link>http://explainextended.com/2011/03/30/mysql-group-by-in-union/</link>
		<comments>http://explainextended.com/2011/03/30/mysql-group-by-in-union/#comments</comments>
		<pubDate>Wed, 30 Mar 2011 19:00:29 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5304</guid>
		<description><![CDATA[In MySQL, a GROUP BY used inside a UNION still sorts, though it should not. This degrades performance and cannot be turned off with ORDER BY NULL unless used along with a LIMIT large enough]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://stackoverflow.com/questions/5486857/using-order-by-null-in-a-union"><strong>Stack Overflow</strong></a>:</p>
<blockquote>
<p>I have a query where I have a custom developed <strong>UDF</strong> that is used to calculate whether or not certain points are within a polygon (first query in <code>UNION</code>) or circular (second query in <code>UNION</code>) shape.</p>
<pre class="brush: sql">
SELECT  a.geo_boundary_id, …
FROM     geo_boundary_vertex a, …
…
GROUP BY
        a.geo_boundary_id
UNION
SELECT  b.geo_boundary_id, …
FROM     geo_boundary b, …
…
GROUP BY
        b.geo_boundary_id
</pre>
<p>When I run an explain for the query I get <code>filesort</code> for both queries within the <code>UNION</code>.</p>
<p>Now, I can split the queries up and use the <code>ORDER BY NULL</code> trick to get rid of the <code>filesort</code> however when I attempt to add that to the end of a <code>UNION</code> it doesn&#8217;t work.</p>
<p>How do I get rid of the <code>filesort</code>?</p>
</blockquote>
<p>In <strong>MySQL</strong>, <code>GROUP BY</code> also implies <code>ORDER BY</code> on the same set of expressions in the same order. That&#8217;s why it adds an additional <code>filesort</code> operation to sort the resultset if it does not come out naturally sorted (say, from an index).</p>
<p>This is not always a desired behavior, and <strong>MySQL</strong> manual suggests adding <code>ORDER BY NULL</code> to the queries where sorting is not required. This can improve performance of the queries.</p>
<p>Let&#8217;s create a sample table and see:</p>
<p><span id="more-5304"></span></p>
<p><a href="#" onclick="xcollapse('X7998');return false;">Table creation details</a><br />
</p>
<div id="X7998" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE grouping (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
        value1 INT NOT NULL,
        value2 INT NOT NULL
) ENGINE=InnoDB;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

INSERT
INTO    grouping (value1, value2)
SELECT  CEILING(RAND(20110330) * 300000),
        CEILING(RAND(20110330 &lt;&lt; 1) * 300000)
FROM    filler
CROSS JOIN
        (
        SELECT  id
        FROM    filler
        LIMIT 30
        ) q;
</pre>
</div>
<p>The table contains <strong>3,000,000</strong> random records with <code>value1</code> and <code>value2</code> between <strong>1</strong> and <strong>300,000</strong>.</p>
<p>Here&#8217;s the plan we get with a mere <code>UNION</code> of two <code>GROUP BY</code> queries:</p>
<pre class="brush: sql">
SELECT  value1 AS value
FROM    grouping
GROUP BY
        value1
UNION
SELECT  value2 AS value
FROM    grouping
GROUP BY
        value2
LIMIT 10
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>value</th>
</tr>
<tr>
<td class="integer">1</td>
</tr>
<tr>
<td class="integer">2</td>
</tr>
<tr>
<td class="integer">3</td>
</tr>
<tr>
<td class="integer">4</td>
</tr>
<tr>
<td class="integer">5</td>
</tr>
<tr>
<td class="integer">6</td>
</tr>
<tr>
<td class="integer">7</td>
</tr>
<tr>
<td class="integer">8</td>
</tr>
<tr>
<td class="integer">9</td>
</tr>
<tr>
<td class="integer">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (8.4998s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">UNION</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union1,2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` union select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` limit 10
</pre>
<p>We see that there is a <code>filesort</code> in each of the queries.</p>
<p><strong>MySQL</strong> does allow using <code>ORDER BY</code> in the queries merged with <code>UNION</code> or <code>UNION ALL</code>. To do this, we just need to wrap each query into a set of parentheses:</p>
<pre class="brush: sql">
(
SELECT  value1 AS value
FROM    grouping
GROUP BY
        value1
ORDER BY
        NULL
)
UNION
(
SELECT  value2 AS value
FROM    grouping
GROUP BY
        value2
ORDER BY
        NULL
)
LIMIT 10
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>value</th>
</tr>
<tr>
<td class="integer">1</td>
</tr>
<tr>
<td class="integer">2</td>
</tr>
<tr>
<td class="integer">3</td>
</tr>
<tr>
<td class="integer">4</td>
</tr>
<tr>
<td class="integer">5</td>
</tr>
<tr>
<td class="integer">6</td>
</tr>
<tr>
<td class="integer">7</td>
</tr>
<tr>
<td class="integer">8</td>
</tr>
<tr>
<td class="integer">9</td>
</tr>
<tr>
<td class="integer">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (8.4792s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">UNION</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union1,2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
(select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` order by NULL) union (select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` order by NULL) limit 10
</pre>
<p>However, the plan remained the same. Why?</p>
<p><strong>MySQL</strong>&#8216;s <a href="http://dev.mysql.com/doc/refman/5.1/en/union.html"><strong>documentation</strong></a> says:</p>
<blockquote>
<p>Use of <code>ORDER BY</code> for individual <code>SELECT</code> statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use of <code>ORDER BY</code> in this context is typically in conjunction with <code>LIMIT</code>, so that it is used to determine the subset of the selected rows to retrieve for the <code>SELECT</code>, even though it does not necessarily affect the order of those rows in the final <code>UNION</code> result. If <code>ORDER BY</code> appears without <code>LIMIT</code> in a <code>SELECT</code>, it is optimized away because it will have no effect anyway.</p>
</blockquote>
<p>This means that the optimizer just removes <code>ORDER BY</code> from the <code>UNION</code> parts if they are not used along with <code>LIMIT</code>.</p>
<p>This is of course a good idea: since individual <code>ORDER BY</code> have no effect on the order of the final query anyway, there is no use in executing them or even taking them into account.</p>
<p>However, this idea would be much better if the same were also true for <code>GROUP BY</code>. Currently, the optimizer does not optimize away the ordering behavior of the <code>GROUP BY</code> queries which are parts of a <code>UNION</code> and they cannot be cured with <code>ORDER BY NULL</code> (whose only goal is <strong>not</strong> to order) since this is removed by the optimizer.</p>
<p>However, since only <code>ORDER BY</code> clauses not accompanied with a <code>LIMIT</code> are thrown away, we could just add a <code>LIMIT</code>. Of course it should be large enough to guarantee that all record would be returned.</p>
<p>Let&#8217;s see:</p>
<pre class="brush: sql">
(
SELECT  value1 AS value
FROM    grouping
GROUP BY
        value1
ORDER BY
        NULL
LIMIT 10000000000
)
UNION
(
SELECT  value2 AS value
FROM    grouping
GROUP BY
        value2
ORDER BY
        NULL
LIMIT 10000000000
)
LIMIT 10
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>value</th>
</tr>
<tr>
<td class="integer">12462</td>
</tr>
<tr>
<td class="integer">205466</td>
</tr>
<tr>
<td class="integer">89941</td>
</tr>
<tr>
<td class="integer">133309</td>
</tr>
<tr>
<td class="integer">96722</td>
</tr>
<tr>
<td class="integer">83683</td>
</tr>
<tr>
<td class="integer">128249</td>
</tr>
<tr>
<td class="integer">90196</td>
</tr>
<tr>
<td class="integer">66232</td>
</tr>
<tr>
<td class="integer">60571</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (6.9842s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">UNION</td>
<td class="varchar">grouping</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">3000279</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union1,2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
(select `20110330_group`.`grouping`.`value1` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value1` order by NULL limit 10000000000) union (select `20110330_group`.`grouping`.`value2` AS `value` from `20110330_group`.`grouping` group by `20110330_group`.`grouping`.`value2` order by NULL limit 10000000000) limit 10
</pre>
<p>Now, there are no <code>filesort</code> operations in the plan and the query runs <strong>20%</strong> faster.</p>
<p>This is not a very elegant solution of course. More than that, a solution similar to it was used for <strong>SQL Server 2000</strong> which does not allow using <code>ORDER BY</code> without a <code>TOP</code> in the inline views. In <strong>SQL Server 2000</strong>, <code>TOP 100%</code> forced the order of the nested queries and usually made them spooled or otherwise materialized.</p>
<p>It was quite a nasty surprise when <strong>SQL Server 2005</strong> has <q>improved</q> its optimizer to detect such tricks and ignore <code>ORDER BY</code> for <strong>TOP 100%</strong> queries. However, with all improvements introduced in <strong>SQL Server 2005</strong>, most of these queries could be just rewritten in a more clean and efficient way.</p>
<p>Nevertheless, this solution is still safe to use, because it does not change the semantics of the query (if <code>LIMIT</code> is chosen large enough), but is just an optimizer hack. In the worst case, the query will just become as slow as it initially was, and an extra <code>filesort</code> is not the worst of all things that can happen to a query.</p>
<p>Meanwhile, I&#8217;ve posted it as <a href="http://bugs.mysql.com/bug.php?id=60702">bug 60702</a> to <strong>MySQL</strong> bug database and am hoping they&#8217;ll fix it in the next release.</p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/03/30/mysql-group-by-in-union/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL: splitting aggregate queries</title>
		<link>http://explainextended.com/2011/03/28/mysql-splitting-aggregate-queries/</link>
		<comments>http://explainextended.com/2011/03/28/mysql-splitting-aggregate-queries/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 19:00:27 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5282</guid>
		<description><![CDATA[In MySQL, using MAX and MIN aggregates on different columns in a single query will prevent the GROUP BY optimization. To work around this, the queries should be split so that MAX and MIN on only one column are calculated in each query then joined on the grouping columns.]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Victor</strong> asks:</p>
<blockquote>
<p>I have a table which I will call <code>sale</code> to protect the innocent:</p>
<table class="excel">
<caption>Sale</caption>
<tr>
<th>id</th>
<th>product</th>
<th>price</th>
<th>amount</th>
<th>date</th>
</tr>
</table>
<p>I need to retrieve ultimate values of <code>price</code>, <code>amount</code> and <code>date</code> for each product:</p>
<pre class="brush: sql">
SELECT  product,
        MIN(price), MAX(price),
        MIN(amount), MAX(amount),
        MIN(date), MAX(date)
FROM    sale
GROUP BY
        product
</pre>
<p>The query only returns about 100 records.</p>
<p>I have all these fields indexed (together with <code>product</code>), but this still produces a full table scan over <strong>3,000,000</strong> records.</p>
<p>How do I speed up this query?</p>
</blockquote>
<p>To retrieve the ultimate values of the fields, <strong>MySQL</strong> would just need to make a loose index scan over each index and take the max and min value of the field for each <code>product</code>.</p>
<p>However, the optimizer won&#8217;t do when multiple indexes are involved. Instead, it will revert to a full scan.</p>
<p>There is a workaround for it. Let&#8217;s create a sample table and see them:</p>
<p><span id="more-5282"></span></p>
<p><a href="#" onclick="xcollapse('X8367');return false;">Table creation details</a><br />
</p>
<div id="X8367" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE sale
        (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
        product INT NOT NULL,
        amount INT NOT NULL,
        price NUMERIC(20, 2) NOT NULL,
        dt DATETIME NOT NULL,
        KEY ix_sale_product_amount (product, amount),
        KEY ix_sale_product_price (product, price),
        KEY ix_sale_product_dt (product, dt)
        )
ENGINE=InnoDB;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(50000);
COMMIT;

INSERT
INTO    sale (product, amount, price, dt)
SELECT  CEILING(RAND(20110227) * 100),
        CEILING(RAND(20110227 &lt;&lt; 1) * 1000) + 100,
        CEILING(RAND(20110227 &lt;&lt; 1 + 1) * 10000) / 100.00 + 20.00,
        &#039;2011-03-27&#039; - INTERVAL CEILING(RAND(20110227 &lt;&lt; 1 + 2) * 10000000) SECOND
FROM    filler
CROSS JOIN
        (
        SELECT  id
        FROM    filler
        ORDER BY
                id
        LIMIT 60
        ) q;
</pre>
</div>
<p>This table contains <strong>3M</strong> random records for <strong>100</strong> distinct products and is indexed appropriately.</p>
<p>Here&#8217;s a straightforward query:</p>
<pre class="brush: sql">
SELECT  product,
        MIN(price), MAX(price),
        MIN(amount), MAX(amount),
        MIN(dt), MAX(dt)
FROM    sale
GROUP BY
        product
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>product</th>
<th>MIN(price)</th>
<th>MAX(price)</th>
<th>MIN(amount)</th>
<th>MAX(amount)</th>
<th>MIN(dt)</th>
<th>MAX(dt)</th>
</tr>
<tr>
<td class="integer">1</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="timestamp">2010-12-01 06:14:22</td>
<td class="timestamp">2011-03-26 23:42:32</td>
</tr>
<tr>
<td class="integer">2</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="timestamp">2010-12-01 06:39:38</td>
<td class="timestamp">2011-03-26 23:58:25</td>
</tr>
<tr>
<td class="integer">3</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="timestamp">2010-12-01 06:13:34</td>
<td class="timestamp">2011-03-26 23:54:32</td>
</tr>
<tr>
<td class="integer">4</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="timestamp">2010-12-01 06:14:30</td>
<td class="timestamp">2011-03-26 23:58:45</td>
</tr>
<tr class="break">
<td colspan="100"/></tr>
<tr>
<td class="integer">100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="timestamp">2010-12-01 06:32:58</td>
<td class="timestamp">2011-03-26 23:58:24</td>
</tr>
<tr class="statusbar">
<td colspan="100">100 rows fetched in 0.0052s (13.8279s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">sale</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">ix_sale_product_amount</td>
<td class="varchar">8</td>
<td class="varchar"></td>
<td class="bigint">3000409</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`price`) AS `MIN(price)`,max(`20110327_split`.`sale`.`price`) AS `MAX(price)`,min(`20110327_split`.`sale`.`amount`) AS `MIN(amount)`,max(`20110327_split`.`sale`.`amount`) AS `MAX(amount)`,min(`20110327_split`.`sale`.`dt`) AS `MIN(dt)`,max(`20110327_split`.`sale`.`dt`) AS `MAX(dt)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`
</pre>
<p>As was expected, the query uses a full table scan (or, rather, an index scan) and takes almost <strong>14</strong> seconds to complete.</p>
<p>In order to make the engine use three separate indexes, we would need to split the query into three queries each searching for ultimate values on one column, and then combine the results.</p>
<p>Each of these queries would be a <code>MIN / MAX</code> query on a trailing column of a composite index, combined with a <code>GROUP BY</code> on the index&#8217;s leading columns, and, as such, would be subject to <a href="http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html#loose-index-scan">loose index scan optimization</a>.</p>
<p>To combine the results, we will of course just join them on <code>product</code>. This will be quite efficient too since the resultsets are going to be quite small (100 records each, exactly).</p>
<p>And here is the query:</p>
<pre class="brush: sql">
SELECT  *
FROM    (
        SELECT  product, MIN(amount), MAX(amount)
        FROM    sale
        GROUP BY
                product
        ) qa
NATURAL JOIN
        (
        SELECT  product, MIN(price), MAX(price)
        FROM    sale
        GROUP BY
                product
        ) qp
NATURAL JOIN
        (
        SELECT  product, MIN(dt), MAX(dt)
        FROM    sale
        GROUP BY
                product
        ) qd
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>product</th>
<th>MIN(amount)</th>
<th>MAX(amount)</th>
<th>MIN(price)</th>
<th>MAX(price)</th>
<th>MIN(dt)</th>
<th>MAX(dt)</th>
</tr>
<tr>
<td class="integer">1</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="timestamp">2010-12-01 06:14:22</td>
<td class="timestamp">2011-03-26 23:42:32</td>
</tr>
<tr>
<td class="integer">2</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="timestamp">2010-12-01 06:39:38</td>
<td class="timestamp">2011-03-26 23:58:25</td>
</tr>
<tr>
<td class="integer">3</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="timestamp">2010-12-01 06:13:34</td>
<td class="timestamp">2011-03-26 23:54:32</td>
</tr>
<tr>
<td class="integer">4</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="timestamp">2010-12-01 06:14:30</td>
<td class="timestamp">2011-03-26 23:58:45</td>
</tr>
<tr class="break">
<td colspan="100"/></tr>
<tr>
<td class="integer">100</td>
<td class="integer">101</td>
<td class="integer">1100</td>
<td class="decimal">20.01</td>
<td class="decimal">120.00</td>
<td class="timestamp">2010-12-01 06:32:58</td>
<td class="timestamp">2011-03-26 23:58:24</td>
</tr>
<tr class="statusbar">
<td colspan="100">100 rows fetched in 0.0053s (0.0105s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">100</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived3&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">100</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using join buffer</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived4&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">100</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using join buffer</td>
</tr>
<tr>
<td class="bigint">4</td>
<td class="varchar">DERIVED</td>
<td class="varchar">sale</td>
<td class="varchar">range</td>
<td class="varchar"></td>
<td class="varchar">ix_sale_product_dt</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">19</td>
<td class="double">100.00</td>
<td class="varchar">Using index for group-by</td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="varchar">DERIVED</td>
<td class="varchar">sale</td>
<td class="varchar">range</td>
<td class="varchar"></td>
<td class="varchar">ix_sale_product_price</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">19</td>
<td class="double">100.00</td>
<td class="varchar">Using index for group-by</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">sale</td>
<td class="varchar">range</td>
<td class="varchar"></td>
<td class="varchar">ix_sale_product_amount</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">1067</td>
<td class="double">100.00</td>
<td class="varchar">Using index for group-by</td>
</tr>
</table>
</div>
<pre>
select `qa`.`product` AS `product`,`qa`.`MIN(amount)` AS `MIN(amount)`,`qa`.`MAX(amount)` AS `MAX(amount)`,`qp`.`MIN(price)` AS `MIN(price)`,`qp`.`MAX(price)` AS `MAX(price)`,`qd`.`MIN(dt)` AS `MIN(dt)`,`qd`.`MAX(dt)` AS `MAX(dt)` from (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`amount`) AS `MIN(amount)`,max(`20110327_split`.`sale`.`amount`) AS `MAX(amount)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qa` join (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`price`) AS `MIN(price)`,max(`20110327_split`.`sale`.`price`) AS `MAX(price)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qp` join (select `20110327_split`.`sale`.`product` AS `product`,min(`20110327_split`.`sale`.`dt`) AS `MIN(dt)`,max(`20110327_split`.`sale`.`dt`) AS `MAX(dt)` from `20110327_split`.`sale` group by `20110327_split`.`sale`.`product`) `qd` where ((`qp`.`product` = `qa`.`product`) and (`qd`.`product` = `qa`.`product`))
</pre>
<p>Each individual query is executed with <code>using index for group-by</code>, that is jumping over the min and max values in a loose index scan.</p>
<p>The queries are combined in a join which, despite not using the indexes, is still very fast because only <strong>100</strong> records are joined (of course fitting into a join buffer).</p>
<p>The overall query time is only <strong>10 ms</strong>.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/03/28/mysql-splitting-aggregate-queries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Things SQL needs: SERIES()</title>
		<link>http://explainextended.com/2011/02/18/things-sql-needs-series/</link>
		<comments>http://explainextended.com/2011/02/18/things-sql-needs-series/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 20:00:47 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5239</guid>
		<description><![CDATA[A window function (or an extension for <code>GROUP BY</code> clause) that would allow grouping continuous series of an expression in an ordered dataset would ease writing queries that need to aggregate such series, make them more efficient and even improve certain kinds of grouping queries where the grouping expression shares its order with one of the indexes.]]></description>
			<content:encoded><![CDATA[<p>Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.</p>
<p><img src="http://explainextended.com/wp-content/uploads/2011/02/135779270_de2e30d0b1_z.jpg" alt="" title="Pawns" width="640" height="480" class="aligncenter size-full wp-image-5271 noborder" /></p>
<p>I believe this could be best illustrated with an example:</p>
<table class="excel">
<tr>
<th>id</th>
<th>source</th>
<th>value</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">1</td>
<td class="int4">10</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">1</td>
<td class="int4">20</td>
</tr>
<tr class="rowbreak">
<td class="int4">3</td>
<td class="int4">2</td>
<td class="int4">15</td>
</tr>
<tr>
<td class="int4">4</td>
<td class="int4">2</td>
<td class="int4">25</td>
</tr>
<tr class="rowbreak">
<td class="int4">5</td>
<td class="int4">1</td>
<td class="int4">45</td>
</tr>
<tr class="rowbreak">
<td class="int4">6</td>
<td class="int4">3</td>
<td class="int4">50</td>
</tr>
<tr>
<td class="int4">7</td>
<td class="int4">3</td>
<td class="int4">35</td>
</tr>
<tr class="rowbreak">
<td class="int4">8</td>
<td class="int4">1</td>
<td class="int4">40</td>
</tr>
<tr>
<td class="int4">9</td>
<td class="int4">1</td>
<td class="int4">10</td>
</tr>
</table>
<p>The records are ordered by <code>id</code>, and within this order there are continuous series of records which share the same value of <code>source</code>. In the table above, the series are separated by thick lines.</p>
<p>We want to calculate some aggregates across each of the series: <code>MIN</code>, <code>MAX</code>, <code>SUM</code>, <code>AVG</code>, whatever:</p>
<table class="excel">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">20</td>
<td class="int8">30</td>
<td class="numeric">15.00</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">15</td>
<td class="int4">25</td>
<td class="int8">40</td>
<td class="numeric">20.00</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">45</td>
<td class="int4">45</td>
<td class="int8">45</td>
<td class="numeric">45.00</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">35</td>
<td class="int4">50</td>
<td class="int8">85</td>
<td class="numeric">42.50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">40</td>
<td class="int8">50</td>
<td class="numeric">25.00</td>
</tr>
</table>
<p>This can be used for different things. I used that for:</p>
<ul>
<li>Reading sensors from a moving elevator (thus tracking its position)</li>
<li>Recording user&#8217;s activity on a site</li>
<li>Tracking the primary node in a server cluster</li>
</ul>
<p>, but almost any seasoned database developer can recall a need for such a query.</p>
<p>As you can see, the values of <code>source</code> are repeating so a mere <code>GROUP BY</code> won&#8217;t work here.</p>
<p>In the systems supporting window functions there is a workaround for that:</p>
<p><span id="more-5239"></span></p>
<pre class="brush: sql">
SELECT  source, MIN(value), MAX(value), SUM(value), AVG(value)::NUMERIC(20, 2)
FROM    (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY source ORDER BY id) AS rno,
                ROW_NUMBER() OVER (ORDER BY id) AS rne
        FROM    (
                VALUES
                (1, 1, 10),
                (2, 1, 20),
                (3, 2, 15),
                (4, 2, 25),
                (5, 1, 45),
                (6, 3, 50),
                (7, 3, 35),
                (8, 1, 40),
                (9, 1, 10)
                ) series (id, source, value)
        ) q
GROUP BY
        source, rne - rno
ORDER BY
        MIN(id)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">20</td>
<td class="int8">30</td>
<td class="numeric">15.00</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">15</td>
<td class="int4">25</td>
<td class="int8">40</td>
<td class="numeric">20.00</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">45</td>
<td class="int4">45</td>
<td class="int8">45</td>
<td class="numeric">45.00</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">35</td>
<td class="int4">50</td>
<td class="int8">85</td>
<td class="numeric">42.50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">10</td>
<td class="int4">40</td>
<td class="int8">50</td>
<td class="numeric">25.00</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0004s (0.0033s)</td>
</tr>
</table>
</div>
<pre>
Sort  (cost=1.40..1.42 rows=9 width=28)
  Sort Key: (min(q.id))
  -&gt;  HashAggregate  (cost=1.01..1.25 rows=9 width=28)
        -&gt;  Subquery Scan q  (cost=0.58..0.85 rows=9 width=28)
              -&gt;  WindowAgg  (cost=0.58..0.74 rows=9 width=12)
                    -&gt;  Sort  (cost=0.58..0.60 rows=9 width=12)
                          Sort Key: &quot;*VALUES*&quot;.column1
                          -&gt;  WindowAgg  (cost=0.26..0.44 rows=9 width=12)
                                -&gt;  Sort  (cost=0.26..0.28 rows=9 width=12)
                                      Sort Key: &quot;*VALUES*&quot;.column2, &quot;*VALUES*&quot;.column1
                                      -&gt;  Values Scan on &quot;*VALUES*&quot;  (cost=0.00..0.11 rows=9 width=12)
</pre>
<p>(this is <strong>PostgreSQL</strong>).</p>
<p>The idea behind this solution is that the overall order of <code>id</code> is retained within each of the series but is broken whenever the series break. Thus, a difference between the overall row number and the <code>source</code>-wise row number is an invariant within each series and is guaranteed to change as the series break. Hence, we can <code>GROUP BY</code> on it (along with the <code>source</code>).</p>
<p>This workaround is nice and elegant, however it requires three sorts and one hash aggregate (apart from the support for window functions of course).</p>
<h3>SERIES function</h3>
<p>My proposal is to implement a certain extension to the <code>GROUP BY</code> clause, namely:</p>
<p><code>GROUP BY SERIES (series_expression) OVER ([PARTITION BY partitioning_expression] ORDER BY ordering_expression)</code></p>
<p>The logic is quite simple: it would find continuous blocks of <code>series_expression</code> as if the records were ordered by <code>series_ordering_expression</code> and assign them to the same group. Within this group, all standard aggregation functions could be supported.</p>
<p>The expression:</p>
<p><code>SERIES (series_expression) OVER ([PARTITION BY partitioning_expression] ORDER BY ordering_expression)</code></p>
<p>could also serve as a normal window function in the systems that support it. In this case, it would return the ordinal number of the series within each partition.</p>
<h3>Implementation</h3>
<p>Its implementation would be quite simple: the records sharing the same value of <code>partitioning_expression</code> should be ordered on <code>ordering_expression</code>, a zero-based (or one-based) counter a variable holding the previous value of the <code>series_expression</code> should be set up. Whenever the value of the <code>series_expression</code> changes, the counter is incremented (and returned as an output of <code>SERIES</code>) and the variable gets the new value of <code>series_expression</code>.</p>
<p>Using the session variables, this can be easily emulated in <strong>MySQL</strong>:</p>
<pre class="brush: sql">
SELECT  source,
        MIN(value) AS min,
        MAX(value) AS max,
        SUM(value) AS sum,
        CAST(AVG(value) AS DECIMAL(20, 2)) AS avg
FROM    (
        SELECT  @series := @series + (COALESCE(@source &lt;&gt; source, 0)) AS series,
                @source := source AS newsource,
                q.*
        FROM    (
                SELECT  @series := 1, @source := NULL
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  1 AS id, 1 AS source, 10 AS value
                UNION ALL
                SELECT  2 AS id, 1 AS source, 20 AS value
                UNION ALL
                SELECT  3 AS id, 2 AS source, 15 AS value
                UNION ALL
                SELECT  4 AS id, 2 AS source, 25 AS value
                UNION ALL
                SELECT  5 AS id, 1 AS source, 45 AS value
                UNION ALL
                SELECT  6 AS id, 3 AS source, 50 AS value
                UNION ALL
                SELECT  7 AS id, 3 AS source, 35 AS value
                UNION ALL
                SELECT  8 AS id, 1 AS source, 40 AS value
                UNION ALL
                SELECT  9 AS id, 1 AS source, 10 AS value
                ) q
        ORDER BY
                id
        ) q
GROUP BY
        series
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>source</th>
<th>min</th>
<th>max</th>
<th>sum</th>
<th>avg</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">10</td>
<td class="bigint">20</td>
<td class="decimal">30</td>
<td class="decimal">15.00</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="bigint">15</td>
<td class="bigint">25</td>
<td class="decimal">40</td>
<td class="decimal">20.00</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">45</td>
<td class="bigint">45</td>
<td class="decimal">45</td>
<td class="decimal">45.00</td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="bigint">35</td>
<td class="bigint">50</td>
<td class="decimal">85</td>
<td class="decimal">42.50</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="bigint">10</td>
<td class="bigint">40</td>
<td class="decimal">50</td>
<td class="decimal">25.00</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0004s (0.0026s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">9</td>
<td class="double">100.00</td>
<td class="varchar">Using temporary; Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">&lt;derived3&gt;</td>
<td class="varchar">system</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">&lt;derived4&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">9</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">4</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">5</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">6</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">7</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">8</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">9</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">10</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">11</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint">12</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union4,5,6,7,8,9,10,11,12&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
</table>
</div>
<pre>
select `q`.`source` AS `source`,min(`q`.`value`) AS `min`,max(`q`.`value`) AS `max`,sum(`q`.`value`) AS `sum`,cast(avg(`q`.`value`) as decimal(20,2)) AS `avg` from (select (@series:=((@series) + coalesce(((@source) &lt;&gt; `q`.`source`),0))) AS `series`,(@source:=`q`.`source`) AS `newsource`,`q`.`id` AS `id`,`q`.`source` AS `source`,`q`.`value` AS `value` from (select (@series:=1) AS `@series := 1`,(@source:=NULL) AS `@source := NULL`) `vars` straight_join (select 1 AS `id`,1 AS `source`,10 AS `value` union all select 2 AS `id`,1 AS `source`,20 AS `value` union all select 3 AS `id`,2 AS `source`,15 AS `value` union all select 4 AS `id`,2 AS `source`,25 AS `value` union all select 5 AS `id`,1 AS `source`,45 AS `value` union all select 6 AS `id`,3 AS `source`,50 AS `value` union all select 7 AS `id`,3 AS `source`,35 AS `value` union all select 8 AS `id`,1 AS `source`,40 AS `value` union all select 9 AS `id`,1 AS `source`,10 AS `value`) `q` order by `q`.`id`) `q` group by `q`.`series`
</pre>
<p>, with only one filesort (which could be avoided if the values were ordered with an index).</p>
<h3>Grouping on expressions sharing the order</h3>
<p>This clause would also allow running certain grouping queries more efficiently.</p>
<p>Say, we need to build a yearly report for sales on a web site. Let&#8217;s create a sample table:</p>
<p><a href="#" onclick="xcollapse('X4287');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X4287" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE sales (
        id INT NOT NULL,
        dt DATETIME NOT NULL,
        amount DECIMAL(20, 2) NOT NULL
) ENGINE=InnoDB;

CREATE INDEX
        ix_sales_dt_amount
ON      sales (dt, amount);

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(500000);
COMMIT;

INSERT
INTO    sales
SELECT  id,
        CAST(&#039;2011-02-18&#039; AS DATE) - INTERVAL id * 10 MINUTE,
        (10000 + CEILING(RAND(20110218) * 10000)) / 100
FROM    filler;
</pre>
</div>
<p>Normally, we do it with this query:</p>
<pre class="brush: sql">
SELECT  YEAR(dt), SUM(amount)
FROM    sales
WHERE   dt &gt;= &#039;2005-01-01&#039;
GROUP BY
        YEAR(dt)
ORDER BY
        NULL
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>YEAR(dt)</th>
<th>SUM(amount)</th>
</tr>
<tr>
<td class="integer">2005</td>
<td class="decimal">7885399.58</td>
</tr>
<tr>
<td class="integer">2006</td>
<td class="decimal">7888231.80</td>
</tr>
<tr>
<td class="integer">2007</td>
<td class="decimal">7881056.65</td>
</tr>
<tr>
<td class="integer">2008</td>
<td class="decimal">7894880.23</td>
</tr>
<tr>
<td class="integer">2009</td>
<td class="decimal">7879728.51</td>
</tr>
<tr>
<td class="integer">2010</td>
<td class="decimal">7884989.40</td>
</tr>
<tr>
<td class="integer">2011</td>
<td class="decimal">1034590.31</td>
</tr>
<tr class="statusbar">
<td colspan="100">7 rows fetched in 0.0002s (0.7834s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">sales</td>
<td class="varchar">range</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">8</td>
<td class="varchar"></td>
<td class="bigint">250224</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using index; Using temporary</td>
</tr>
</table>
</div>
<pre>
select year(`20110218_series`.`sales`.`dt`) AS `YEAR(dt)`,sum(`20110218_series`.`sales`.`amount`) AS `SUM(amount)` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;) group by year(`20110218_series`.`sales`.`dt`) order by NULL
</pre>
<p>The query is pretty fast, but we see a <code>using temporary</code> in the plan. Since the order of of <code>dt</code> is always that of <code>YEAR(dt)</code>, and the records are selected from the index on <code>dt</code> (which naturally come ordered), the temporary table is redundant. A sort grouping can be used instead.</p>
<p>We have no means to tell <strong>MySQL</strong> (or any other database system) to tell that the orders match, but with the extension proposed, we could do just that:</p>
<pre class="brush: sql">
SELECT  YEAR(MIN(dt)), SUM(amount)
FROM    sales
WHERE   dt &gt;= &#039;2005-01-01&#039;
GROUP BY
        SERIES(YEAR(dt)) OVER (ORDER BY dt)
</pre>
<p>Since <code>YEAR(dt)</code> will always form continuous series being ordered by <code>dt</code>, this is the same as the query above, but <code>filesort</code> could be avoided.</p>
<p>Let&#8217;s emulate this using session variables:</p>
<pre class="brush: sql">
SELECT  CAST(year AS UNSIGNED), CAST(sum_amount AS DECIMAL(20, 2))
FROM    (
        SELECT  @sum := 0,
                @year := YEAR(MIN(dt))
        FROM    sales
        WHERE   dt &gt;= &#039;2005-01-01&#039;
        ) vars
STRAIGHT_JOIN
        (
        SELECT  @sum AS sum_amount,
                @sum := CASE WHEN @year = YEAR(dt) THEN @sum + amount ELSE amount END,
                @year = YEAR(dt) AS series,
                @year AS year,
                @year := YEAR(dt)
        FROM    sales
        WHERE   dt &gt;= &#039;2005-01-01&#039;
        ORDER BY
                dt
        ) q
WHERE   series = 0
UNION ALL
SELECT  CAST(@year AS UNSIGNED), CAST(@sum AS DECIMAL(20, 2))
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>CAST(year AS UNSIGNED)</th>
<th>CAST(sum_amount AS DECIMAL(20, 2))</th>
</tr>
<tr>
<td class="bigint">2005</td>
<td class="decimal">7885399.58</td>
</tr>
<tr>
<td class="bigint">2006</td>
<td class="decimal">7888231.80</td>
</tr>
<tr>
<td class="bigint">2007</td>
<td class="decimal">7881056.65</td>
</tr>
<tr>
<td class="bigint">2008</td>
<td class="decimal">7894880.23</td>
</tr>
<tr>
<td class="bigint">2009</td>
<td class="decimal">7879728.51</td>
</tr>
<tr>
<td class="bigint">2010</td>
<td class="decimal">7884989.40</td>
</tr>
<tr>
<td class="bigint">2011</td>
<td class="decimal">1034590.31</td>
</tr>
<tr class="statusbar">
<td colspan="100">7 rows fetched in 0.0002s (0.5375s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">system</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived3&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">322416</td>
<td class="double">100.00</td>
<td class="varchar">Using where</td>
</tr>
<tr>
<td class="bigint">3</td>
<td class="varchar">DERIVED</td>
<td class="varchar">sales</td>
<td class="varchar">range</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">ix_sales_dt_amount</td>
<td class="varchar">8</td>
<td class="varchar"></td>
<td class="bigint">250224</td>
<td class="double">100.00</td>
<td class="varchar">Using where; Using index</td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">Select tables optimized away</td>
</tr>
<tr>
<td class="bigint">4</td>
<td class="varchar">UNION</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar">No tables used</td>
</tr>
<tr>
<td class="bigint"></td>
<td class="varchar">UNION RESULT</td>
<td class="varchar">&lt;union1,4&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint"></td>
<td class="double"></td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select cast(`q`.`year` as unsigned) AS `CAST(year AS UNSIGNED)`,cast(`q`.`sum_amount` as decimal(20,2)) AS `CAST(sum_amount AS DECIMAL(20, 2))` from (select (@sum:=0) AS `@sum := 0`,(@year:=year(min(`20110218_series`.`sales`.`dt`))) AS `@year := YEAR(MIN(dt))` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;)) `vars` straight_join (select (@sum) AS `sum_amount`,(@sum:=(case when ((@year) = year(`20110218_series`.`sales`.`dt`)) then ((@sum) + `20110218_series`.`sales`.`amount`) else `20110218_series`.`sales`.`amount` end)) AS `@sum := CASE WHEN @year = YEAR(dt) THEN @sum + amount ELSE amount END`,((@year) = year(`20110218_series`.`sales`.`dt`)) AS `series`,(@year) AS `year`,(@year:=year(`20110218_series`.`sales`.`dt`)) AS `@year := YEAR(dt)` from `20110218_series`.`sales` where (`20110218_series`.`sales`.`dt` &gt;= &#39;2005-01-01&#39;) order by `20110218_series`.`sales`.`dt`) `q` where (`q`.`series` = 0) union all select cast((@year) as unsigned) AS `CAST(@year AS UNSIGNED)`,cast((@sum) as decimal(20,2)) AS `CAST(@sum AS DECIMAL(20, 2))`
</pre>
<p>We see that though inline view support is pretty inefficient in <strong>MySQL</strong> and the resulting query is not as fast as it should be, there is no <code>using temporary</code> and the query is more efficient. With a native support for <code>SERIES()</code>, it would be yet more fast.</p>
<h3>Conclusion</h3>
<p>A window function (or an extension for <code>GROUP BY</code> clause) that would allow grouping continuous series of an expression in an ordered dataset would ease writing queries that need to aggregate such series, make them more efficient and even improve certain kinds of grouping queries where the grouping expression shares its order with one of the indexes.</p>
<h3>Update of Feb 21 2011</h3>
<p>Another nice solution proposed by <strong>@delostilos</strong>:</p>
<pre class="brush: sql">
SELECT  MIN(source), series, SUM(value)
FROM    (
        SELECT  *,
                SUM(COALESCE((source &lt;&gt; ns)::INTEGER, 0)) OVER (ORDER BY id) AS series
        FROM    (
                SELECT  series.*,
                        LAG(source) OVER (ORDER BY id) AS ns
                FROM    (
                        VALUES
                        (1, 1, 10),
                        (2, 1, 20),
                        (3, 2, 15),
                        (4, 2, 25),
                        (5, 1, 45),
                        (6, 3, 50),
                        (7, 3, 35),
                        (8, 1, 40),
                        (9, 1, 10)
                        ) series (id, source, value)
                ) q
        ) q
GROUP BY
        series
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>min</th>
<th>series</th>
<th>sum</th>
</tr>
<tr>
<td class="int4">2</td>
<td class="int8">1</td>
<td class="int8">40</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">2</td>
<td class="int8">45</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int8">3</td>
<td class="int8">85</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">4</td>
<td class="int8">50</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int8">0</td>
<td class="int8">30</td>
</tr>
<tr class="statusbar">
<td colspan="100">5 rows fetched in 0.0005s (0.0030s)</td>
</tr>
</table>
</div>
<pre>
HashAggregate  (cost=0.84..0.98 rows=9 width=16)
  -&gt;  WindowAgg  (cost=0.26..0.68 rows=9 width=16)
        -&gt;  WindowAgg  (cost=0.26..0.41 rows=9 width=12)
              -&gt;  Sort  (cost=0.26..0.28 rows=9 width=12)
                    Sort Key: &quot;*VALUES*&quot;.column1
                    -&gt;  Values Scan on &quot;*VALUES*&quot;  (cost=0.00..0.11 rows=9 width=12)
</pre>
<p>This is more efficient than double <code>ROW_NUMBER</code>, since the engine only needs to sort on the <code>id</code> expression at most once (or does not need to sort at all if it comes sorted from the previous recordset or from the index).</p>
<p>However, the final <code>GROUP BY</code> still cannot use sort aggregation without additional sorting, since the output of the analytic <code>SUM()</code> cannot be assumed to have the same order as <code>id</code>. That&#8217;s why the native support from the system would still be an improvement.</p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/02/18/things-sql-needs-series/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Late row lookups: InnoDB</title>
		<link>http://explainextended.com/2011/02/11/late-row-lookups-innodb/</link>
		<comments>http://explainextended.com/2011/02/11/late-row-lookups-innodb/#comments</comments>
		<pubDate>Fri, 11 Feb 2011 20:00:09 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5214</guid>
		<description><![CDATA[A trick to avoid early row lookups is also useful for InnoDB tables]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Aryé</strong> asks:</p>
<blockquote><p>Thanks for your article about <a href="/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/">late row lookups</a> in <strong>MySQL</strong>.</p>
<p>I have two questions for you please:</p>
<ul>
<li>Is this workaround specific to <strong>MyISAM</strong> engine?</li>
<li>How does <strong>PostgreSQL</strong> handle this?</li>
</ul>
</blockquote>
<p>The questions concerns a certain workaround for <strong>MySQL</strong> <code>LIMIT … OFFSET</code> queries like this:</p>
<pre class="brush: sql">
SELECT  *
FROM    mytable
ORDER BY
        id
LIMIT   10
OFFSET  10000
</pre>
<p>which can be improved using a little rewrite:</p>
<pre class="brush: sql">
SELECT  m.*
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT   10
        OFFSET  10000
        ) q
JOIN    mytable m
ON      m.id = q.id
ORDER BY
        m.id
</pre>
<p>For the rationale behind this improvement, please read <a href="/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/">the original article</a>.</p>
<p>Now, to the questions.</p>
<p>The second questions is easy: <strong>PostgreSQL</strong> won&#8217;t pull the fields from the table until it really needs them. If a query involving an <code>ORDER BY</code> along with <code>LIMIT</code> and <code>OFFSET</code> is optimized to use the index for the <code>ORDER BY</code> part, the table lookups won&#8217;t happen for the records skipped.</p>
<p>Though <strong>PostgreSQL</strong> does not reflect the table lookups in the <code>EXPLAIN</code> output, a simple test would show us that they are done only <code>LIMIT</code> times, not <code>OFFSET + LIMIT</code> (like <strong>MySQL</strong> does).</p>
<p>Now, let&#8217;s try to answer the first question: will this trick improve the queries against an <strong>InnoDB</strong> table?</p>
<p>To do this, we will create a sample table:</p>
<p><span id="more-5214"></span></p>
<p><a href="#" onclick="xcollapse('X10348');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X10348" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE lookup (
        id INT NOT NULL PRIMARY KEY,
        value INT NOT NULL,
        shorttxt TEXT NOT NULL,
        longtxt TEXT NOT NULL
) ENGINE=InnoDB ROW_FORMAT=COMPACT;

CREATE INDEX
        ix_lookup_value
ON      lookup (value);

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt &lt;= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

INSERT
INTO    lookup
SELECT  id, CEILING(RAND(20110211) * 1000000),
        RPAD(&#039;&#039;, CEILING(RAND(20110211 &lt;&lt; 1) * 100), &#039;*&#039;),
        RPAD(&#039;&#039;, CEILING(8192 + RAND(20110211 &lt;&lt; 1) * 100), &#039;*&#039;)
FROM    filler;
</pre>
</div>
<p>There is one indexed <code>INT</code> column and two <code>TEXT</code> columns, <code>shorttxt</code> storing short strings (<strong>1</strong> to <strong>100</strong> characters), <code>longtxt</code> storing long strings (<strong>8193</strong> to <strong>8293</strong> characters).</p>
<p>Let&#8217;s run some queries against this table.</p>
<h3>PRIMARY KEY and the INT column</h3>
<p>No rewrite:</p>
<pre class="brush: sql">
SELECT  value
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000
</pre>
<p><a href="#" onclick="xcollapse('X6751');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X6751" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>value</th>
</tr>
<tr>
<td class="integer">12336</td>
</tr>
<tr>
<td class="integer">314476</td>
</tr>
<tr>
<td class="integer">535374</td>
</tr>
<tr>
<td class="integer">733443</td>
</tr>
<tr>
<td class="integer">61089</td>
</tr>
<tr>
<td class="integer">105117</td>
</tr>
<tr>
<td class="integer">342318</td>
</tr>
<tr>
<td class="integer">396237</td>
</tr>
<tr>
<td class="integer">954232</td>
</tr>
<tr>
<td class="integer">582449</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.0415s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select `20110211_late`.`lookup`.`value` AS `value` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10
</pre>
</div>
<p>Rewrite:</p>
<p><a href="#" onclick="xcollapse('X5888');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X5888" style="display: none; background: transparent;">
<pre class="brush: sql">
SELECT  l.value
FROM    (
        SELECT  id
        FROM    lookup
        ORDER BY
                id
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.id
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>value</th>
</tr>
<tr>
<td class="integer">12336</td>
</tr>
<tr>
<td class="integer">314476</td>
</tr>
<tr>
<td class="integer">535374</td>
</tr>
<tr>
<td class="integer">733443</td>
</tr>
<tr>
<td class="integer">61089</td>
</tr>
<tr>
<td class="integer">105117</td>
</tr>
<tr>
<td class="integer">342318</td>
</tr>
<tr>
<td class="integer">396237</td>
</tr>
<tr>
<td class="integer">954232</td>
</tr>
<tr>
<td class="integer">582449</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.0407s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">10</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">l</td>
<td class="varchar">eq_ref</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar">q.id</td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar">Using index</td>
</tr>
</table>
</div>
<pre>
select `20110211_late`.`l`.`value` AS `value` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`
</pre>
</div>
<p>As you can see, there is almost no difference (<strong>41 ms</strong> vs <strong>40 ms</strong>).</p>
<p><strong>InnoDB</strong> tables are clustered on their <code>PRIMARY KEY</code> columns, which means the the index on <code>id</code> (used to serve the <code>ORDER BY</code> condition) contains all the data the query needs. There is a negligible benefit from not lookup up the <code>value</code> columns at the index pages because the column is tiny and the index pages need to be read anyway.</p>
<h3>PRIMARY KEY and the short TEXT column</h3>
<p>Rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(l.shorttxt)
FROM    (
        SELECT  id
        FROM    lookup
        ORDER BY
                id
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.id
</pre>
<p><a href="#" onclick="xcollapse('X4628');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X4628" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(l.shorttxt)</th>
</tr>
<tr>
<td class="bigint">17</td>
</tr>
<tr>
<td class="bigint">41</td>
</tr>
<tr>
<td class="bigint">52</td>
</tr>
<tr>
<td class="bigint">39</td>
</tr>
<tr>
<td class="bigint">36</td>
</tr>
<tr>
<td class="bigint">65</td>
</tr>
<tr>
<td class="bigint">15</td>
</tr>
<tr>
<td class="bigint">78</td>
</tr>
<tr>
<td class="bigint">44</td>
</tr>
<tr>
<td class="bigint">85</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0401s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">10</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">l</td>
<td class="varchar">eq_ref</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar">q.id</td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar">Using index</td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`l`.`shorttxt`) AS `LENGTH(l.shorttxt)` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`
</pre>
</div>
<p>No rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000
</pre>
<p><a href="#" onclick="xcollapse('X966');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X966" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(shorttxt)</th>
</tr>
<tr>
<td class="bigint">17</td>
</tr>
<tr>
<td class="bigint">41</td>
</tr>
<tr>
<td class="bigint">52</td>
</tr>
<tr>
<td class="bigint">39</td>
</tr>
<tr>
<td class="bigint">36</td>
</tr>
<tr>
<td class="bigint">65</td>
</tr>
<tr>
<td class="bigint">15</td>
</tr>
<tr>
<td class="bigint">78</td>
</tr>
<tr>
<td class="bigint">44</td>
</tr>
<tr>
<td class="bigint">85</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.0925s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`lookup`.`shorttxt`) AS `LENGTH(shorttxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10
</pre>
</div>
<p>There is quite a significant difference (<strong>92 ms</strong> vs <strong>40 ms</strong>) the reasons for which we will discuss a little bit later, after we see the results of the third query.</p>
<h3>PRIMARY KEY and the long TEXT column</h3>
<p>Rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(l.longtxt)
FROM    (
        SELECT  id
        FROM    lookup
        ORDER BY
                id
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.id
</pre>
<p><a href="#" onclick="xcollapse('X1865');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X1865" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(l.longtxt)</th>
</tr>
<tr>
<td class="bigint">8209</td>
</tr>
<tr>
<td class="bigint">8233</td>
</tr>
<tr>
<td class="bigint">8244</td>
</tr>
<tr>
<td class="bigint">8231</td>
</tr>
<tr>
<td class="bigint">8228</td>
</tr>
<tr>
<td class="bigint">8257</td>
</tr>
<tr>
<td class="bigint">8207</td>
</tr>
<tr>
<td class="bigint">8270</td>
</tr>
<tr>
<td class="bigint">8236</td>
</tr>
<tr>
<td class="bigint">8277</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.0396s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">10</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">l</td>
<td class="varchar">eq_ref</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar">q.id</td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar">Using index</td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`l`.`longtxt`) AS `LENGTH(l.longtxt)` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`
</pre>
</div>
<p>No rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(longtxt)
FROM    lookup
ORDER BY
        id
LIMIT   10
OFFSET  90000
</pre>
<p><a href="#" onclick="xcollapse('X759');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X759" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(longtxt)</th>
</tr>
<tr>
<td class="bigint">8209</td>
</tr>
<tr>
<td class="bigint">8233</td>
</tr>
<tr>
<td class="bigint">8244</td>
</tr>
<tr>
<td class="bigint">8231</td>
</tr>
<tr>
<td class="bigint">8228</td>
</tr>
<tr>
<td class="bigint">8257</td>
</tr>
<tr>
<td class="bigint">8207</td>
</tr>
<tr>
<td class="bigint">8270</td>
</tr>
<tr>
<td class="bigint">8236</td>
</tr>
<tr>
<td class="bigint">8277</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0000s (30.3594s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`lookup`.`longtxt`) AS `LENGTH(longtxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10
</pre>
</div>
<p>The query employing the trick runs for same <strong>40 ms</strong>, while the straightforward query takes as much as <strong>30</strong> seconds (<strong>30,359 ms</strong>, to be exact).</p>
<p>Why such a difference?</p>
<p>The reason is that <code>InnoDB</code>, despite the fact it stores the data in the clustered index, is still able to move some data out of the index. This is called <q>external storage</q>.</p>
<p>With <code>COMPACT</code> row format I used to create the tables, <code>InnoDB</code>, when trying to insert a record with two <code>TEXT</code> columns on the page, will try to fit both of them on the page. If this is not possible, then it will split the longest of the records in two parts: the first <strong>768</strong> bytes will be stored on the page and the remaining data will be stored on a separate page (or pages), with a pointer to these data stored in the original clustered index. This will be repeated until all <code>TEXT</code> columns fit on the page of there is no more space there (in which case an error would be thrown).</p>
<p>This means that all <code>TEXT</code> columns shorter than <strong>768</strong> bytes will be stored completely on the page, while those longer can be either split of stored as a whole (with at least first <strong>768</strong> bytes still being on the page).</p>
<p>With the column lengths chosen (<strong>1</strong> to <strong>100</strong> and <strong>8K</strong> to <strong>8K + 100</strong>, accordingly), it can be easily seen that <code>shorttxt</code> will <em>always</em> be stored on-page, while <code>longtxt</code> will <em>always</em> be split (since <code>InnoDB</code> allows at most <code>8K</code> per record (minus some overhead) to be stored on one page).</p>
<p>Now, this becomes more clear. As with <strong>MyISAM</strong>, the straightforward query involving <code>longtxt</code> should perform two page lookups per each record scanned: the first one on the clustered index, the second one on the external storage. This takes lots of time and may even spoil the <strong>InnoDB</strong> cache with unneeded data (which would lead to increased cache miss ratio).</p>
<p>The query on <code>shorttxt</code> is not that bad, since it only requires come extra CPU cycles per record to calculate the string length.</p>
<p>Now, let&#8217;s check one more query which orders by a secondary indexed field rather than <code>id</code>:</p>
<h3>Secondary index and the short text column</h3>
<p>Rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(l.shorttxt)
FROM    (
        SELECT  id, value
        FROM    lookup
        ORDER BY
                value
        LIMIT   10
        OFFSET  90000
        ) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
        q.value
</pre>
<p><a href="#" onclick="xcollapse('X145');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X145" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(l.shorttxt)</th>
</tr>
<tr>
<td class="bigint">14</td>
</tr>
<tr>
<td class="bigint">85</td>
</tr>
<tr>
<td class="bigint">16</td>
</tr>
<tr>
<td class="bigint">34</td>
</tr>
<tr>
<td class="bigint">77</td>
</tr>
<tr>
<td class="bigint">78</td>
</tr>
<tr>
<td class="bigint">3</td>
</tr>
<tr>
<td class="bigint">49</td>
</tr>
<tr>
<td class="bigint">53</td>
</tr>
<tr>
<td class="bigint">60</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.0262s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">&lt;derived2&gt;</td>
<td class="varchar">ALL</td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="varchar"></td>
<td class="bigint">10</td>
<td class="double">100.00</td>
<td class="varchar">Using filesort</td>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">l</td>
<td class="varchar">eq_ref</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">PRIMARY</td>
<td class="varchar">4</td>
<td class="varchar">q.id</td>
<td class="bigint">1</td>
<td class="double">100.00</td>
<td class="varchar"></td>
</tr>
<tr>
<td class="bigint">2</td>
<td class="varchar">DERIVED</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">ix_lookup_value</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar">Using index</td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`l`.`shorttxt`) AS `LENGTH(l.shorttxt)` from (select `20110211_late`.`lookup`.`id` AS `id`,`20110211_late`.`lookup`.`value` AS `value` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`value` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`value`
</pre>
</div>
<p>No rewrite:</p>
<pre class="brush: sql">
SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
        value
LIMIT   10
OFFSET  90000
</pre>
<p><a href="#" onclick="xcollapse('X8905');return false;"><strong>Query results</strong></a><br />
</p>
<div id="X8905" style="display: none; background: transparent;">
<div class="terminal">
<table class="terminal">
<tr>
<th>LENGTH(shorttxt)</th>
</tr>
<tr>
<td class="bigint">14</td>
</tr>
<tr>
<td class="bigint">85</td>
</tr>
<tr>
<td class="bigint">16</td>
</tr>
<tr>
<td class="bigint">34</td>
</tr>
<tr>
<td class="bigint">77</td>
</tr>
<tr>
<td class="bigint">78</td>
</tr>
<tr>
<td class="bigint">3</td>
</tr>
<tr>
<td class="bigint">49</td>
</tr>
<tr>
<td class="bigint">53</td>
</tr>
<tr>
<td class="bigint">60</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0002s (0.2663s)</td>
</tr>
</table>
</div>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
<tr>
<td class="bigint">1</td>
<td class="varchar">SIMPLE</td>
<td class="varchar">lookup</td>
<td class="varchar">index</td>
<td class="varchar"></td>
<td class="varchar">ix_lookup_value</td>
<td class="varchar">4</td>
<td class="varchar"></td>
<td class="bigint">90010</td>
<td class="double">622.36</td>
<td class="varchar"></td>
</tr>
</table>
</div>
<pre>
select length(`20110211_late`.`lookup`.`shorttxt`) AS `LENGTH(shorttxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`value` limit 90000,10
</pre>
</div>
<p>The execution times for these queries vary tenfold: <strong>26 ms</strong> vs <strong>266 ms</strong>.</p>
<p>As with <strong>MyISAM</strong>, the secondary index requires an extra lookup to retrieve the actual values from the table (the only difference is that any index is secondary in <strong>MyISAM</strong>, including that used to police the <code>PRIMARY KEY</code>).</p>
<p>The first query does not perform these row lookups on the skipped records and hence is ten times as fast. It is even faster than queries ordering on the <code>PRIMARY KEY</code>, because the secondary index contains significantly less data than the <code>PRIMARY KEY</code>, holds much more records per page, is hence more shallow and can be traversed faster.</p>
<p>The second query does perform the early row lookups, as usual.</p>
<h3>Conclusion</h3>
<p>A trick used to avoid early row lookups for the <code>LIMIT … OFFSET</code> queries is useful on <strong>InnoDB</strong> tables too, though to different extent, depending on the <code>ORDER BY</code> condition and the columns involved:</p>
<ul>
<li>It&#8217;s very useful on queries involving columns stored off-page (long <code>TEXT</code>, <code>BLOB</code> and <code>VARCHAR</code> columns)</li>
<li>It&#8217;s very useful on <code>ORDER BY</code> conditions served by secondary indexes</li>
<li>It&#8217;s quite useful on moderate sized columns (still stored on page) or CPU-intensive expressions</li>
<li>It&#8217;s almost useless on short columns without complex CPU-intensive processing</li>
</ul>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2011/02/11/late-row-lookups-innodb/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Happy New Year!</title>
		<link>http://explainextended.com/2010/12/31/happy-new-year-2/</link>
		<comments>http://explainextended.com/2010/12/31/happy-new-year-2/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 20:00:17 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5177</guid>
		<description><![CDATA[A New Year clock in Oracle. Happy New Year!]]></description>
			<content:encoded><![CDATA[<p>Some say <strong>SQL</strong> is not good at graphics.</p>
<p>Well, they have some point. Database engines lack scanner drivers, there is no easy way to do sepia, and magic wand, let&#8217;s be honest, is just poor.</p>
<p>However, you can make some New Year paintings with <strong>SQL</strong>.</p>
<p>Let&#8217;s make a New Year clock showing 12 o&#8217;clock in <strong>Oracle</strong>.</p>
<p><span id="more-5177"></span></p>
<h3>#1. Circle</h3>
<p>First, we need a circle shape. In <strong>Photoshop</strong>, you could just select it from the toolbar, but in <strong>SQL</strong>, we need to use some math.</p>
<p>In math, a circle is defined by this formula: <code>x<sup>2</sup> + y<sup>2</sup> = R<sup>2</sup></code>.</p>
<p><code>x<sup>2</sup> + y<sup>2</sup></code> is, as we know, the square of a distance from the center of the coordinate grid, and <code>R</code> is the radius. This means <q>all points at distance <code>R</code> from the center</q>, which of course is a circle shape.</p>
<p><strong>SQL</strong> outputs data in tabular format. We will draw out art string by string, and for each line we need to know where to put the <q>paint</q> (just some <strong>ASCII</strong> symbols). Since it&#8217;s a circle, every line intersects it at most twice, and we shall put the symbols at the places where the circle and the line intersect. To know how many symbols are there from the beginning of the string, this formula is used:</p>
<p><code>ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle</code></p>
<p>To make our circle more circle-shaped, we need to choose the characters that correspond to the angle of the circle line in each given place. These characters are the most similar to the lines: <code>=/|\</code></p>
<p>Different characters correspond to different angles. To figure out the angle from the line number, we should use <code>ACOS</code>:</p>
<p><code>ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign</code></p>
<p>This would split the circle into several sectors, and for each of these sectors a corresponding character will be chosen.</p>
<p>Finally, we need to avoid the gaps in the lines. To do this, we will not only fill the intersections, but all space from the previous (or next) intersection to the current. To calculated the width of the string on each line that will be filled with the characters, we will use <code>LAG</code> and <code>LEAD</code>, the analytic functions.</p>
<p>Now, let&#8217;s put it all together:</p>
<pre class="brush: sql">
WITH    circle AS
        (
        SELECT  1 AS layer,
                line,
                LPAD(RPAD(SUBSTR(&#039;=/|\=&#039;, sign, 1), width, SUBSTR(&#039;=/|\=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) ||
                RPAD(&#039; &#039;, (angle - width) * 2, &#039; &#039;) ||
                RPAD(RPAD(SUBSTR(&#039;=\|/=&#039;, sign, 1), width, SUBSTR(&#039;=\|/=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) AS drawing
        FROM    (
                SELECT  line,
                        angle,
                        ABS
                        (
                        angle -
                        CASE
                        WHEN line &lt; 21 THEN
                                COALESCE(LAG(angle) OVER (ORDER BY line), 0)
                        WHEN line &gt; 21 THEN
                                COALESCE(LEAD(angle) OVER (ORDER BY line), 0)
                        ELSE
                                angle
                        END
                        ) + 1 AS width,
                        ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign
                FROM    (
                        SELECT  level AS line,
                                ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 41
                        ) q
                ) q
        )
SELECT  *
FROM    circle
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="varchar2">                    ==                    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">3</td>
<td class="varchar2">           ////          \\\\           </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">4</td>
<td class="varchar2">         ///                \\\         </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">5</td>
<td class="varchar2">        //                    \\        </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">6</td>
<td class="varchar2">       //                      \\       </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">7</td>
<td class="varchar2">      //                        \\      </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">8</td>
<td class="varchar2">     //                          \\     </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">9</td>
<td class="varchar2">    //                            \\    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">10</td>
<td class="varchar2">   //                              \\   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">11</td>
<td class="varchar2">   /                                \   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">12</td>
<td class="varchar2">  //                                \\  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">13</td>
<td class="varchar2">  /                                  \  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">14</td>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">15</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">16</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">17</td>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">18</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">19</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">20</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">21</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">22</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">23</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">24</td>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">25</td>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">26</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">27</td>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">28</td>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">29</td>
<td class="varchar2">  \                                  /  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">30</td>
<td class="varchar2">  \\                                //  </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">31</td>
<td class="varchar2">   \                                /   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">32</td>
<td class="varchar2">   \\                              //   </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">33</td>
<td class="varchar2">    \\                            //    </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">34</td>
<td class="varchar2">     \\                          //     </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">35</td>
<td class="varchar2">      \\                        //      </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">36</td>
<td class="varchar2">       \\                      //       </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">37</td>
<td class="varchar2">        \\                    //        </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">38</td>
<td class="varchar2">         \\\                ///         </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">39</td>
<td class="varchar2">           \\\\          ////           </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">40</td>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">41</td>
<td class="varchar2">                    ==                    </td>
</tr>
</table>
</div>
<h3>#2. Dial</h3>
<p>Now, we shall draw the clock dial on a separate layer (yes, you can draw on layers in <strong>SQL</strong>).</p>
<p>We will use Roman numerals for the dial. <strong>Oracle</strong> has an internal function to format the numbers as Roman numerals, but <strong>SQL</strong> graphics is the first time I&#8217;m using this feature in practice.</p>
<p>The principle is the same: we should calculate the angle of each number, then the line and the column that intersect the dial at the given angle and put the number there.</p>
<p>We will only output the lines containing actual numbers. Each line will contain one or two numbers (<strong>1</strong> and <strong>6</strong> go on their own lines, the others go in pairs). We can use <code>MIN</code> and <code>MAX</code> to distinguish them.</p>
<p>Here&#8217;s what the lines of our dial will look like:</p>
<pre class="brush: sql">
WITH    dial AS
        (
        SELECT  2,
                line,
                RPAD(&#039; &#039;, 20 - angle, &#039; &#039;) ||
                RPAD(rnf, angle * 2 - LENGTH(rns)) ||
                RPAD(rns, 20 - angle + DECODE(rnf, NULL, 0, LENGTH(rns)) + 1)
        FROM    (
                SELECT  line, angle,
                        DECODE(MAX(h), MIN(h), NULL, TRIM(TO_CHAR(MAX(h), &#039;RN&#039;))) AS rnf,
                        TRIM(TO_CHAR(MIN(h), &#039;RN&#039;)) AS rns
                FROM    (
                        SELECT  level + 2 AS line,
                                ROUND(SQRT(1 - POWER((level - 19) / 18, 2)) * 18) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 37
                        ) lines
                JOIN    (
                        SELECT  level AS h, ROUND(-COS(3.141592 * level / 6) * 18) + 21 AS hline
                        FROM    dual
                        CONNECT BY
                                level &lt;= 12
                        ) hours
                ON      hline = line
                GROUP BY
                        line, angle
                ) q
        )
SELECT  *
FROM    dial
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>2</th>
<th>LINE</th>
<th>RPAD(&#8221;,20-ANGLE,&#8221;)||RPAD(RNF,ANGLE*2-LENGTH(RNS))||RPAD(RNS,20-ANGLE+DECODE(RNF,NULL,0,LENGTH(RNS))+1)</th>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">3</td>
<td class="varchar2">                    XII                  </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">5</td>
<td class="varchar2">            XI             I             </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">12</td>
<td class="varchar2">    X                             II     </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">21</td>
<td class="varchar2">  IX                               III   </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">30</td>
<td class="varchar2">    VIII                          IV     </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">37</td>
<td class="varchar2">            VII            V             </td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">39</td>
<td class="varchar2">                    VI                   </td>
</tr>
</table>
</div>
<p>Note that the shape is distorted: this is because of the gaps in the line numbers. It will be taken care of later.</p>
<h3>#3. Hands and the center pin</h3>
<p>Hands and the center pin are simple: we will use <code>|||</code> for the hour hand, <code>|</code> for the minute hand and <code>O</code> for the pin. On a New Year clock the hands, fortunately, are vertical and directed in one way.</p>
<p>Here&#8217;s the hour hand:</p>
<pre class="brush: sql">
WITH    hourhand AS
        (
        SELECT  3 AS layer,
                level + 10 AS line,
                RPAD(&#039; &#039;, 20, &#039; &#039;) || &#039;|||&#039; || RPAD(&#039; &#039;, 18, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 10
        )
SELECT  *
FROM    hourhand
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">11</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">12</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">13</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">14</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">15</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">16</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">17</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">18</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">19</td>
<td class="varchar2">                    |||                  </td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">20</td>
<td class="varchar2">                    |||                  </td>
</tr>
</table>
</div>
<p>, the minute hand:</p>
<pre class="brush: sql">
WITH    minutehand AS
        (
        SELECT  4 AS layer,
                level + 4 AS line,
                RPAD(&#039; &#039;, 21, &#039; &#039;) || &#039;|&#039; || RPAD(&#039; &#039;, 19, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 16
        )
SELECT  *
FROM    minutehand
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">5</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">6</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">7</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">8</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">9</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">10</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">11</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">12</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">13</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">14</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">15</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">16</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">17</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">18</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">19</td>
<td class="varchar2">                     |                   </td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">20</td>
<td class="varchar2">                     |                   </td>
</tr>
</table>
</div>
<p>and the pin:</p>
<pre class="brush: sql">
WITH    pin AS
        (
        SELECT  5 AS layer,
                21 AS line,
                RPAD(LPAD(&#039;O&#039;, 22, &#039; &#039;), 41, &#039; &#039;) AS drawing
        FROM    dual
        )
SELECT  *
FROM    pin
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>LAYER</th>
<th>LINE</th>
<th>DRAWING</th>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">21</td>
<td class="varchar2">                     O                   </td>
</tr>
</table>
</div>
<h3>#4. Merging the layers.</h3>
<p>Now we need to merge the layers.</p>
<p>The last layers should go on the foreground, the first ones on the background. Space symbol should be <q>transparent</q>: if there is a character on the background layer, it should be visible through it.</p>
<p>To do it we we apply this trick:</p>
<ol>
<li>Output all rows from all layers in a single query, using <code>UNION ALL</code></li>
<li>Split each row into the individual characters, one character per record</li>
<li>
<p>Using <code>ROW_NUMBER()</code>, the analytical function, order the characters within each line and columns so that visible characters from higher layers go first, and spaces and characters from the lower layers go last:</p>
<p><code>ROW_NUMBER() OVER (PARTITION BY line, col ORDER BY DECODE(cc, ' ', 1, 0), layer DESC) AS rn</code></p>
</li>
<li>Filter the characters on <code>rn</code> so that only the topmost character from the each line and column is returned</li>
<li>Concatenate all characters line-wise, using a recursive query and <code>SYS_CONNECT_BY_PATH</code>. This function requires a separator, and we will use a question mark, <code>?</code>, for this purpose, since it&#8217;s not used in our art.</li>
<li>Remove the separator from the concatenated strings, using <code>REPLACE</code></li>
<li>Output the result</li>
</ol>
<h3>#5. Result</h3>
<p>Here&#8217;s our query and drawing:</p>
<pre class="brush: sql">
WITH    circle AS
        (
        SELECT  1 AS layer,
                line,
                LPAD(RPAD(SUBSTR(&#039;=/|\=&#039;, sign, 1), width, SUBSTR(&#039;=/|\=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) ||
                RPAD(&#039; &#039;, (angle - width) * 2, &#039; &#039;) ||
                RPAD(RPAD(SUBSTR(&#039;=\|/=&#039;, sign, 1), width, SUBSTR(&#039;=\|/=&#039;, sign, 1)), 20 + width - angle, &#039; &#039;) AS drawing
        FROM    (
                SELECT  line,
                        angle,
                        ABS
                        (
                        angle -
                        CASE
                        WHEN line &lt; 21 THEN
                                COALESCE(LAG(angle) OVER (ORDER BY line), 0)
                        WHEN line &gt; 21 THEN
                                COALESCE(LEAD(angle) OVER (ORDER BY line), 0)
                        ELSE
                                angle
                        END
                        ) + 1 AS width,
                        ROUND(ACOS(angle / 20) / 3.1415926 * 4)  * SIGN((line - 21) / 20) + 3 AS sign
                FROM    (
                        SELECT  level AS line,
                                ROUND(SQRT(1 - POWER((level - 21) / 20, 2)) * 20) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 41
                        ) q
                ) q
        ),
        dial AS
        (
        SELECT  2,
                line,
                RPAD(&#039; &#039;, 20 - angle, &#039; &#039;) ||
                RPAD(rnf, angle * 2 - LENGTH(rns)) ||
                RPAD(rns, 20 - angle + DECODE(rnf, NULL, 0, LENGTH(rns)) + 1)
        FROM    (
                SELECT  line, angle,
                        DECODE(MAX(h), MIN(h), NULL, TRIM(TO_CHAR(MAX(h), &#039;RN&#039;))) AS rnf,
                        TRIM(TO_CHAR(MIN(h), &#039;RN&#039;)) AS rns
                FROM    (
                        SELECT  level + 2 AS line,
                                ROUND(SQRT(1 - POWER((level - 19) / 18, 2)) * 18) AS angle
                        FROM    dual
                        CONNECT BY
                                level &lt;= 37
                        ) lines
                JOIN    (
                        SELECT  level AS h, ROUND(-COS(3.141592 * level / 6) * 18) + 21 AS hline
                        FROM    dual
                        CONNECT BY
                                level &lt;= 12
                        ) hours
                ON      hline = line
                GROUP BY
                        line, angle
                ) q
        ),
        hourhand AS
        (
        SELECT  3 AS layer,
                level + 10 AS line,
                RPAD(&#039; &#039;, 20, &#039; &#039;) || &#039;|||&#039; || RPAD(&#039; &#039;, 18, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 10
        ),
        minutehand AS
        (
        SELECT  4 AS layer,
                level + 4 AS line,
                RPAD(&#039; &#039;, 21, &#039; &#039;) || &#039;|&#039; || RPAD(&#039; &#039;, 19, &#039; &#039;) AS drawing
        FROM    dual
        CONNECT BY
                level &lt;= 16
        ),
        pin AS
        (
        SELECT  5 AS layer,
                21 AS line,
                RPAD(LPAD(&#039;O&#039;, 22, &#039; &#039;), 41, &#039; &#039;) AS drawing
        FROM    dual
        ),
        m AS
        (
        SELECT  line, col, cc
        FROM    (
                SELECT  line, col, cc,
                        ROW_NUMBER() OVER (PARTITION BY line, col ORDER BY DECODE(cc, &#039; &#039;, 1, 0), layer DESC) AS rn
                FROM    (
                        SELECT  cols.*, layers.*, SUBSTR(drawing, col, 1) AS cc
                        FROM    (
                                SELECT  level AS col
                                FROM    dual
                                CONNECT BY
                                        level &lt;= 50
                                ) cols
                        CROSS JOIN
                                (
                                SELECT  *
                                FROM    circle
                                UNION ALL
                                SELECT  *
                                FROM    dial
                                UNION ALL
                                SELECT  *
                                FROM    hourhand
                                UNION ALL
                                SELECT  *
                                FROM    minutehand
                                UNION ALL
                                SELECT  *
                                FROM    pin
                                ) layers
                        )
                )
        WHERE   rn = 1
        )
SELECT  REPLACE(drawing, &#039;?&#039;) AS drawing
FROM    (
        SELECT  SYS_CONNECT_BY_PATH(cc, &#039;?&#039;) AS drawing, line, CONNECT_BY_ISLEAF AS leaf
        FROM    m mi
        START WITH
                mi.col = 1
        CONNECT BY
                mi.line = PRIOR mi.line
                AND mi.col = PRIOR mi.col + 1
        )
WHERE   leaf = 1
ORDER BY
        line
</pre>
<div class="terminal widefont">
<table class="terminal">
<tr>
<th>DRAWING</th>
</tr>
<tr>
<td class="varchar2">                    ==                    </td>
</tr>
<tr>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="varchar2">           ////     XII  \\\\           </td>
</tr>
<tr>
<td class="varchar2">         ///                \\\         </td>
</tr>
<tr>
<td class="varchar2">        //  XI       |     I  \\        </td>
</tr>
<tr>
<td class="varchar2">       //            |         \\       </td>
</tr>
<tr>
<td class="varchar2">      //             |          \\      </td>
</tr>
<tr>
<td class="varchar2">     //              |           \\     </td>
</tr>
<tr>
<td class="varchar2">    //               |            \\    </td>
</tr>
<tr>
<td class="varchar2">   //                |             \\   </td>
</tr>
<tr>
<td class="varchar2">   /                |||             \   </td>
</tr>
<tr>
<td class="varchar2">  //X               |||           II\\  </td>
</tr>
<tr>
<td class="varchar2">  /                 |||              \  </td>
</tr>
<tr>
<td class="varchar2"> ||                 |||              || </td>
</tr>
<tr>
<td class="varchar2"> |                  |||               | </td>
</tr>
<tr>
<td class="varchar2"> |                  |||               | </td>
</tr>
<tr>
<td class="varchar2">||                  |||               ||</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">|                   |||                |</td>
</tr>
<tr>
<td class="varchar2">| IX                 O             III |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">|                                      |</td>
</tr>
<tr>
<td class="varchar2">||                                    ||</td>
</tr>
<tr>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="varchar2"> |                                    | </td>
</tr>
<tr>
<td class="varchar2"> ||                                  || </td>
</tr>
<tr>
<td class="varchar2">  \                                  /  </td>
</tr>
<tr>
<td class="varchar2">  \\VIII                          IV//  </td>
</tr>
<tr>
<td class="varchar2">   \                                /   </td>
</tr>
<tr>
<td class="varchar2">   \\                              //   </td>
</tr>
<tr>
<td class="varchar2">    \\                            //    </td>
</tr>
<tr>
<td class="varchar2">     \\                          //     </td>
</tr>
<tr>
<td class="varchar2">      \\                        //      </td>
</tr>
<tr>
<td class="varchar2">       \\                      //       </td>
</tr>
<tr>
<td class="varchar2">        \\  VII            V  //        </td>
</tr>
<tr>
<td class="varchar2">         \\\                ///         </td>
</tr>
<tr>
<td class="varchar2">           \\\\     VI   ////           </td>
</tr>
<tr>
<td class="varchar2">              ==============              </td>
</tr>
<tr>
<td class="varchar2">                    ==                    </td>
</tr>
</table>
</div>
<div class="plainnote" style="text-align: center">
<big><strong>Happy New Year!</strong></big>
</div>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/12/31/happy-new-year-2/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>PostgreSQL: parametrizing a recursive CTE</title>
		<link>http://explainextended.com/2010/12/24/postgresql-parametrizing-a-recursive-cte/</link>
		<comments>http://explainextended.com/2010/12/24/postgresql-parametrizing-a-recursive-cte/#comments</comments>
		<pubDate>Fri, 24 Dec 2010 20:00:45 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5146</guid>
		<description><![CDATA[An anchor part of a recursive CTE cannot be easily parametrized in a view. To work around this, we can wrap the CTE into a set-returning function which would accept the parameter and use it in the anchor part.]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Jan Suchal</strong> asks:</p>
<blockquote>
<p>We&#8217;ve started playing with <strong>PostgreSQL</strong> and recursive queries. Looking at example that does basic graph traversal from <a href="http://www.postgresql.org/docs/9.0/static/queries-with.html">http://www.postgresql.org/docs/9.0/static/queries-with.html</a>.</p>
<p>We would like to have a view called <code>paths</code> defined like this:</p>
<pre class="brush: sql">
WITH RECURSIVE
        search_graph(id, path) AS
        (
        SELECT  id, ARRAY[id]
        FROM    node
        UNION ALL
        SELECT  g.dest, sg.path || g.dest
        FROM    search_graph sg
        JOIN    graph g
        ON      g.source = sg.id
                AND NOT g.dest = ANY(sg.path)
        )
SELECT  path
FROM    search_graph
</pre>
<p>By calling</p>
<pre class="brush: sql">
SELECT  *
FROM    paths
WHERE   path[1] = :node_id
</pre>
<p>we would get all paths from a certain node.</p>
<p>The problem here is with performance. When you want this to be quick you need to add a condition for the anchor part of the <code>UNION</code> like this:</p>
<pre class="brush: sql">
WITH RECURSIVE
        search_graph(id, path) AS
        (
        SELECT  id, ARRAY[id]
        FROM    node
        WHERE   id = :node_id
        UNION ALL
        SELECT  g.dest, sg.path || g.dest
        FROM    search_graph sg
        JOIN    graph g
        ON      g.source = sg.id
                AND NOT g.dest = ANY(sg.path)
        )
SELECT  path
FROM    search_graph
</pre>
<p>Now it&#8217;s perfectly fast, but we cannot create a view because that would only contain paths from one specific node.</p>
<p>Any ideas?</p>
</blockquote>
<p>An often overlooked feature of <strong>PostgreSQL</strong> is its ability to create set-returning functions and use them in <code>SELECT</code> list.</p>
<p>The record will be cross-joined with the set returned by the function and the result of the join will be added to the resultset.</p>
<p>This is best demonstrated with <code>generate_series</code>, probably a most used <strong>PostgreSQL</strong> set-returning function.<br />
<span id="more-5146"></span></p>
<h3>Emulating CROSS APPLY with set-returning functions</h3>
<p>Let&#8217;s write a simple query that returns numbers from <strong>1</strong> to <strong>3</strong>:</p>
<pre class="brush: sql">
SELECT  id
FROM    (
        VALUES
        (1),
        (2),
        (3)
        ) vals (id)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
</tr>
<tr>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">3</td>
</tr>
</table>
</div>
<p>Now, let&#8217;s add <code>generate_series(1, id)</code> to the <code>SELECT</code> list. As we can see, each of the three values is passed as an argument to <code>generate_series</code>:</p>
<pre class="brush: sql">
SELECT  id, generate_series(1, id)
FROM    (
        VALUES
        (1),
        (2),
        (3)
        ) vals (id)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>id</th>
<th>generate_series</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
</table>
</div>
<p>We see that each record of the original query was cross-joined with the set returned by the function, and the final resultset now has <strong>6</strong> records instead of <strong>3</strong>, since <strong>3</strong> sets, having <strong>1</strong>, <strong>2</strong> and <strong>3</strong> records, respectively, were results of these cross-joins.</p>
<p>This is almost what <code>CROSS APPLY</code> does in <strong>SQL Server</strong>, but with some limitations and caveats.</p>
<h3>Limitations of set-returning functions</h3>
<ul>
<li>
<p>One should create a function explicitly. Anonymous blocks won&#8217;t work.</p>
</li>
<li>
<p>The functions should be written in <strong>SQL</strong> or <strong>C</strong>. Procedural language set-returning functions cannot be used in <code>SELECT</code> lists</p>
</li>
<li>
<p>If the function return an empty set, the result of the cross join will be an empty set too (i. e. the corresponding record won&#8217;t be returned at all). This is exactly how <code>CROSS APPLY</code> behaves in <strong>SQL Server</strong>.</p>
<p>However, the latter supports <code>OUTER APPLY</code> which always returns a single record with a <code>NULL</code> in corresponding fields instead of the empty set, and it&#8217;s impossible to emulate it in <strong>PostgreSQL</strong> using this syntax.</p>
<p>A workaround would be to develop the function so that it would return a single <code>NULL</code> instead of an empty set.</p>
</li>
<li>
<p>The function should be marked <code>VOLATILE</code> so that the optimizer would execute it each time it&#8217;s called. This is especially important when the function is used multiple times in the same <code>SELECT</code> list, like this:</p>
<pre class="brush: sql">
SELECT  generate_series(1, 3), generate_series(1, 2)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>generate_series</th>
<th>generate_series</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">2</td>
</tr>
</table>
</div>
<p>This query correctly returns <strong>6</strong> records.</p>
<pre class="brush: sql">
SELECT  generate_series(1, 3), generate_series(1, 3)
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>generate_series</th>
<th>generate_series</th>
</tr>
<tr>
<td class="int4">1</td>
<td class="int4">1</td>
</tr>
<tr>
<td class="int4">2</td>
<td class="int4">2</td>
</tr>
<tr>
<td class="int4">3</td>
<td class="int4">3</td>
</tr>
</table>
</div>
<p>This query <em>incorrectly</em> returns <strong>3</strong> records (it should return <strong>9</strong>). This is because function is not reevaluated.</p>
</li>
</ul>
<h3>Parametrizing the path query</h3>
<p>Now, we are informed enough to create our own function and test it. To do this, we will first create some sample tables:</p>
<p><a href="#" onclick="xcollapse('X2468');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X2468" style="display: none; background: transparent;">
<pre class="brush: sql">
CREATE TABLE node
        (
        id BIGINT NOT NULL PRIMARY KEY,
        name VARCHAR(100) NOT NULL
        );

CREATE TABLE graph
        (
        source BIGINT NOT NULL,
        dest BIGINT NOT NULL,
        data FLOAT NOT NULL,
        PRIMARY KEY (source, dest)
        );

CREATE INDEX
        ix_graph_dest
ON      graph (dest);

SELECT  SETSEED(0.20101224);

INSERT
INTO    node
SELECT  s, &#039;Node &#039; || s
FROM    generate_series(1, 10000) s;

INSERT
INTO    graph (source, dest, data)
SELECT  source, dest, RANDOM()
FROM    (
        SELECT  DISTINCT
                CEIL(RANDOM() * 10000) AS source,
                CEIL(RANDOM() * 10000) AS dest
        FROM    generate_series(1, 10000)
        ) q
WHERE   source &lt;&gt; dest
</pre>
</div>
<p>There are <strong>10,000</strong> nodes with <strong>9,996</strong> random paths between them.</p>
<p>Here&#8217;s what the function would look like:</p>
<pre class="brush: sql">
CREATE OR REPLACE FUNCTION fn_search_graph_cte(parent BIGINT)
RETURNS TABLE
        (
        path BIGINT[]
        )
AS
$$
        WITH RECURSIVE
                search_graph(id, path) AS
                (
                SELECT  $1, ARRAY[$1]
                UNION ALL
                SELECT  g.dest, sg.path || g.dest
                FROM    search_graph sg
                JOIN    graph g
                ON      g.source = sg.id
                        AND NOT g.dest = ANY(sg.path)
                )
        SELECT  path
        FROM    search_graph;
$$
LANGUAGE &#039;sql&#039;
VOLATILE;
</pre>
<p>The anchor part of the recursive <strong>CTE</strong> is parametrized.</p>
<h3>Paths from a single node</h3>
<p>Now, let&#8217;s compare performance of the queries that select paths from a single node:</p>
<pre class="brush: sql">
WITH RECURSIVE
        search_graph(id, path) AS
        (
        SELECT  id, ARRAY[id]
        FROM    node
        UNION ALL
        SELECT  g.dest, sg.path || g.dest
        FROM    search_graph sg
        JOIN    graph g
        ON      g.source = sg.id
                AND NOT g.dest = ANY(sg.path)
        )
SELECT  path::TEXT
FROM    search_graph
WHERE   path[1] = 69
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>path</th>
</tr>
<tr>
<td class="text">{69}</td>
</tr>
<tr>
<td class="text">{69,3804}</td>
</tr>
<tr>
<td class="text">{69,3642}</td>
</tr>
<tr>
<td class="text">{69,3642,3768}</td>
</tr>
<tr>
<td class="text">{69,3642,2925}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683,8668}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683,8668,3705}</td>
</tr>
<tr class="statusbar">
<td colspan="100">8 rows fetched in 0.0002s (0.5156s)</td>
</tr>
</table>
</div>
<pre>
CTE Scan on search_graph  (cost=151096.75..187130.22 rows=7999 width=32)
  Filter: (path[1] = 69)
  CTE search_graph
    -&gt;  Recursive Union  (cost=0.00..151096.75 rows=1599710 width=40)
          -&gt;  Seq Scan on node  (cost=0.00..164.00 rows=10000 width=8)
          -&gt;  Hash Join  (cost=288.91..11893.85 rows=158971 width=40)
                Hash Cond: (sg.id = g.source)
                Join Filter: (g.dest &lt;&gt; ALL (sg.path))
                -&gt;  WorkTable Scan on search_graph sg  (cost=0.00..2000.00 rows=100000 width=40)
                -&gt;  Hash  (cost=163.96..163.96 rows=9996 width=16)
                      -&gt;  Seq Scan on graph g  (cost=0.00..163.96 rows=9996 width=16)
</pre>
<p>This query materializes the whole <strong>CTE</strong> and then filters it for the paths beginning from <strong>69</strong>. It takes more than <strong>500 ms</strong>.</p>
<pre class="brush: sql">
SELECT  fn_search_graph_cte(id)::TEXT
FROM    node
WHERE   id = 69
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>fn_search_graph_cte</th>
</tr>
<tr>
<td class="text">{69}</td>
</tr>
<tr>
<td class="text">{69,3642}</td>
</tr>
<tr>
<td class="text">{69,3804}</td>
</tr>
<tr>
<td class="text">{69,3642,2925}</td>
</tr>
<tr>
<td class="text">{69,3642,3768}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683,8668}</td>
</tr>
<tr>
<td class="text">{69,3642,2925,5683,8668,3705}</td>
</tr>
<tr class="statusbar">
<td colspan="100">8 rows fetched in 0.0003s (0.0030s)</td>
</tr>
</table>
</div>
<pre>
Index Scan using node_pkey on node  (cost=0.00..8.52 rows=1 width=8)
  Index Cond: (id = 69)
</pre>
<p>This uses the function. Since the anchor part of the <strong>CTE</strong> has only one record, this is much faster and completes in <strong>3 ms</strong>.</p>
<h3>All paths</h3>
<p>Let&#8217;s check how long does it take to return all paths.</p>
<p>First, let&#8217;s use the <strong>CTE</strong>:</p>
<pre class="brush: sql">
WITH RECURSIVE
        search_graph(id, path) AS
        (
        SELECT  id, ARRAY[id]
        FROM    node
        UNION ALL
        SELECT  g.dest, sg.path || g.dest
        FROM    search_graph sg
        JOIN    graph g
        ON      g.source = sg.id
                AND NOT g.dest = ANY(sg.path)
        )
SELECT  COUNT(*), AVG(ARRAY_LENGTH(path, 1))
FROM    search_graph
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>count</th>
<th>avg</th>
</tr>
<tr>
<td class="int8">159522</td>
<td class="numeric">10.3358157495517860</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.5469s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=191089.51..191089.52 rows=1 width=32)
  CTE search_graph
    -&gt;  Recursive Union  (cost=0.00..151096.75 rows=1599710 width=40)
          -&gt;  Seq Scan on node  (cost=0.00..164.00 rows=10000 width=8)
          -&gt;  Hash Join  (cost=288.91..11893.85 rows=158971 width=40)
                Hash Cond: (sg.id = g.source)
                Join Filter: (g.dest &lt;&gt; ALL (sg.path))
                -&gt;  WorkTable Scan on search_graph sg  (cost=0.00..2000.00 rows=100000 width=40)
                -&gt;  Hash  (cost=163.96..163.96 rows=9996 width=16)
                      -&gt;  Seq Scan on graph g  (cost=0.00..163.96 rows=9996 width=16)
  -&gt;  CTE Scan on search_graph  (cost=0.00..31994.20 rows=1599710 width=32)
</pre>
<p>Now, the function:</p>
<pre class="brush: sql">
SELECT  COUNT(*), AVG(ARRAY_LENGTH(path, 1))
FROM    (
        SELECT  fn_search_graph_cte(id) AS path
        FROM    node
        ) q
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>count</th>
<th>avg</th>
</tr>
<tr>
<td class="int8">159522</td>
<td class="numeric">10.3358157495517860</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (2.1093s)</td>
</tr>
</table>
</div>
<pre>
Aggregate  (cost=2814.01..2814.02 rows=1 width=32)
  -&gt;  Seq Scan on node  (cost=0.00..2664.00 rows=10000 width=8)
</pre>
<p>We see that the function introduces some overhead: it needs to be called <strong>10,000</strong> times and this takes <strong>2</strong> seconds (as opposed to <strong>500 ms</strong> for the plain <strong>CTE</strong>). This means that the function is more efficient if the total number of the root nodes is less than <strong>25%</strong> of all nodes.</p>
<h3>Conclusion</h3>
<p>A set-returning function is a good replacement for a view over a recursive <strong>CTE</strong> whose anchor part cannot be easily parametrized.</p>
<p>The sets returned by the function, when called in a <code>SELECT</code> list, are cross-joined with the corresponding records. When the filter on the <strong>CTE</strong> is selective, the function will only be applied few times to the records satisfying the filter.</p>
<p>Call to a function, however, introduces some overhead which should be taken into account. In some cases, when a filter is not very selective, it is better to use the benefits of set-based operations to generate the view and then filter it than to call a function multiple times, adding the overhead of each call.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/12/24/postgresql-parametrizing-a-recursive-cte/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>10 things in MySQL (that won&#8217;t work as expected)</title>
		<link>http://explainextended.com/2010/11/03/10-things-in-mysql-that-wont-work-as-expected/</link>
		<comments>http://explainextended.com/2010/11/03/10-things-in-mysql-that-wont-work-as-expected/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 20:00:48 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=5003</guid>
		<description><![CDATA[10 things in MySQL (that won't work as expected).]]></description>
			<content:encoded><![CDATA[<p>(I just discovered <a href="http://www.cracked.com/">cracked.com</a>)</p>
<h3 class="cracked">#10. Searching for a NULL</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_2538-e1288838824800.jpg" alt="" title="LifeView" width="700" height="467" class="aligncenter size-full wp-image-5102 noborder" /></p>
<pre class="brush: sql">
SELECT  *
FROM    a
WHERE   a.column = NULL
</pre>
<p>In <strong>SQL</strong>, a <code>NULL</code> is never equal to anything, even another <code>NULL</code>. This query won&#8217;t return anything and in fact will be thrown out by the optimizer when building the plan.</p>
<p>When searching for <code>NULL</code> values, use this instead:</p>
<pre class="brush: sql">
SELECT  *
FROM    a
WHERE   a.column IS NULL
</pre>
<p><span id="more-5003"></span></p>
<h3 class="cracked">#9. LEFT JOIN with additional conditions</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_2692-e1288839117298.jpg" alt="" title="Street" width="700" height="467" class="aligncenter size-full wp-image-5103 noborder" /></p>
<pre class="brush: sql">
SELECT  *
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
WHERE   b.column = &#039;something&#039;
</pre>
<p>A <code>LEFT JOIN</code> is like <code>INNER JOIN</code> except that it will return each record from <code>a</code> at least once, substituting missing fields from <code>b</code> with <code>NULL</code> values, if there are no actual matching records.</p>
<p>The <code>WHERE</code> condition, however, is evaluated after the <code>LEFT JOIN</code> so the query above checks <code>column</code> <em>after</em> it had been joined. And as we learned earlier, no <code>NULL</code> value can satisfy an equality condition, so the records from <code>a</code> without corresponding record from <code>b</code> will unavoidably be filtered out.</p>
<p>Essentially, this query is an <code>INNER JOIN</code>, only less efficient.</p>
<p>To match only the records with <code>b.column = 'something'</code> (while still returning all records from <code>a</code>), this condition should be moved into <code>ON</code> clause:</p>
<pre class="brush: sql">
SELECT  *
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
        AND b.column = &#039;something&#039;
</pre>
<h3 class="cracked">#8. Less than a value but not a NULL</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_2919-e1288839226492.jpg" alt="" title="Restaurant" width="700" height="467" class="aligncenter size-full wp-image-5104 noborder" /></p>
<p>Quite often I see the queries like this:</p>
<pre class="brush: sql">
SELECT  *
FROM    b
WHERE   b.column &lt; &#039;something&#039;
        AND b.column IS NOT NULL
</pre>
<p>This is actually not an error: this query is valid and will do what&#8217;s intended. However, <code>IS NOT NULL</code> here is redundant.</p>
<p>If <code>b.column</code> is a <code>NULL</code>, then <code>b.column < 'something'</code> will never be satisfied, since any comparison to <code>NULL</code> evaluates to a boolean <code>NULL</code> and does not pass the filter.</p>
<p>It is interesting that this additional <code>NULL</code> check is never used for <q>greater than</q> queries (like in <code>b.column > 'something'</code>).</p>
<p>This is because <code>NULL</code> go first in <code>ORDER BY</code> in <strong>MySQL</strong> and hence are incorrectly considered <q>less</q> than any other value by some people.</p>
<p>This query can be simplified:</p>
<pre class="brush: sql">
SELECT  *
FROM    b
WHERE   b.column &lt; &#039;something&#039;
</pre>
<p>and will still never return a <code>NULL</code> in <code>b.column</code>.</p>
<h3 class="cracked">#7. Joining on NULL</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_3163-e1288839302867.jpg" alt="" title="Helicopter" width="700" height="467" class="aligncenter size-full wp-image-5105 noborder" /></p>
<pre class="brush: sql">
SELECT  *
FROM    a
JOIN    b
ON      a.column = b.column
</pre>
<p>When <code>column</code> is nullable in both tables, this query won't return a match of two <code>NULL</code>s for the reasons described above: no <code>NULL</code>s are equal.</p>
<p>Here's a query to do that:</p>
<pre class="brush: sql">
SELECT  *
FROM    a
JOIN    b
ON      a.column = b.column
        OR (a.column IS NULL AND b.column IS NULL)
</pre>
<p><strong>MySQL</strong>'s optimizer treats this as an equijoin and provides a special join condition, <code>ref_or_null</code>.</p>
<h3 class="cracked">#6. NOT IN with NULL values</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_4025-e1288839428333.jpg" alt="" title="Neva" width="700" height="467" class="aligncenter size-full wp-image-5106 noborder" /></p>
<pre class="brush: sql">
SELECT  a.*
FROM    a
WHERE   a.column NOT IN
        (
        SELECT column
        FROM    b
        )
</pre>
<p>This query will never return anything if there is but a single <code>NULL</code> in <code>b.column</code>. As with other predicates, both <code>IN</code> and <code>NOT IN</code> against <code>NULL</code> evaluate to <code>NULL</code>.</p>
<p>This should be rewritten using a <code>NOT EXISTS</code>:</p>
<pre class="brush: sql">
SELECT  a.*
FROM    a
WHERE   NOT EXISTS
        (
        SELECT NULL
        FROM    b
        WHERE   b.column = a.column
        )
</pre>
<p>Unlike <code>IN</code>, <code>EXISTS</code> always evaluates to either <code>true</code> or <code>false</code>.</p>
<h3 class="cracked">#5. Ordering random samples</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_8806-e1288839714383.jpg" alt="" title="Camel" width="700" height="467" class="aligncenter size-full wp-image-5109 noborder" /></p>
<pre class="brush: sql">
SELECT  *
FROM    a
ORDER BY
        RAND(), column
LIMIT 10
</pre>
<p>This query attempts to select <strong>10</strong> random records ordered by <code>column</code>.</p>
<p><code>ORDER BY</code> orders the output lexicographically: that is, the records are only ordered on the second expression when the values of the first expression are equal.</p>
<p>However, the results of <code>RAND()</code> are, well, random. It's infeasible that the values of <code>RAND()</code> will match, so ordering on <code>column </code> after <code>RAND()</code> is quite useless.</p>
<p>To order the randomly sampled records, use this query:</p>
<pre class="brush: sql">
SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        ORDER BY
                RAND()
        LIMIT 10
        ) q
ORDER BY
        column
</pre>
<h3 class="cracked">#4. Sampling arbitrary record from a group</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_8273-e1288839506460.jpg" alt="" title="Inscription" width="700" height="467" class="aligncenter size-full wp-image-5107 noborder" /></p>
<p>This query intends to select one <code>column</code> from each group (defined by <code>grouper</code>)</p>
<pre class="brush: sql">
SELECT  DISTINCT(grouper), a.*
FROM    a
</pre>
<p><code>DISTINCT</code> is not a function, it's a part of <code>SELECT</code> clause. It applies to all columns in the <code>SELECT</code> list, and the parentheses here may just be omitted. This query may and will select the duplicates on <code>grouper</code> (if the values in at least one of the other columns differ).</p>
<p>Sometimes, it's worked around using this query (which relies on <strong>MySQL</strong>'s extensions to <code>GROUP BY</code>):</p>
<pre class="brush: sql">
SELECT  a.*
FROM    a
GROUP BY
        grouper
</pre>
<p>Unaggregated columns returned within each group are arbitrarily taken.</p>
<p>At first, this appears to be a nice solution, but it has quite a serious drawback. It relies on the assumption that all values returned, though taken arbitrarily from the group, will still belong to one record.</p>
<p>Though with current implementation is seems to be so, it's not documented and can be changed in any moment (especially if <strong>MySQL</strong> will ever learn to apply <code>index_union</code> after <code>GROUP BY</code>). So it's not safe to rely on this behavior.</p>
<p>This query would be easy to rewrite in a cleaner way if <strong>MySQL</strong> supported analytic functions. However, it's still possible to make do without them, if the table has a <code>PRIMARY KEY</code> defined:</p>
<pre class="brush: sql">
SELECT  a.*
FROM    (
        SELECT  DISTINCT grouper
        FROM    a
        ) ao
JOIN    a
ON      a.id =
        (
        SELECT  id
        FROM    a ai
        WHERE   ai.grouper = ao.grouper
        LIMIT 1
        )
</pre>
<h3 class="cracked">#3. Sampling first record from a group</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_8468-e1288839620474.jpg" alt="" title="Thermae" width="700" height="467" class="aligncenter size-full wp-image-5108 noborder" /></p>
<p>This is a variation of the previous query:</p>
<pre class="brush: sql">
SELECT  a.*
FROM    a
GROUP BY
        grouper
ORDER BY
        MIN(id) DESC
</pre>
<p>Unlike the previous query, this one attempts to select the record holding the minimal <code>id</code>.</p>
<p>Again: it is not guaranteed that the unaggregated values returned by <code>a.*</code> will belong to a record holding <code>MIN(id)</code> (or even to a single record at all).</p>
<p>Here's how to do it in a clean way:</p>
<pre class="brush: sql">
SELECT  a.*
FROM    (
        SELECT  DISTINCT grouper
        FROM    a
        ) ao
JOIN    a
ON      a.id =
        (
        SELECT  id
        FROM    a ai
        WHERE   ai.grouper = ao.grouper
        ORDER BY
                ai.grouper, ai.id
        LIMIT 1
        )
</pre>
<p>This query is just like the previous one but with <code>ORDER BY</code> added to ensure that the first record in <code>id</code> order will be returned.</p>
<h3 class="cracked">#2. IN and comma-separated list of values</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_9734-e1288839801704.jpg" alt="" title="Mushrooms" width="700" height="467" class="aligncenter size-full wp-image-5110 noborder" /></p>
<p>This query attempts to match the value of <code>column</code> against any of those provided in a comma-separated string:</p>
<pre class="brush: sql">
SELECT  *
FROM    a
WHERE   column IN (&#039;1, 2, 3&#039;)
</pre>
<p>This does not work because the string is not expanded in the <code>IN</code> list.</p>
<p>Instead, if column <code>column</code> is a <code>VARCHAR</code>, it is compared (as a string) to the whole list (also as a string), and of course will never match. If <code>column</code> is of a numeric type, the list is cast into the numeric type as well (and only the first item will match, at best).</p>
<p>The correct way to deal with this query would be rewriting it as a proper <code>IN</code> list</p>
<pre class="brush: sql">
SELECT  *
FROM    a
WHERE   column IN (1, 2, 3)
</pre>
<p>,  or as an inline view:</p>
<pre class="brush: sql">
SELECT  *
FROM    (
        SELECT  1 AS id
        UNION ALL
        SELECT  2 AS id
        UNION ALL
        SELECT  3 AS id
        ) q
JOIN    a
ON      a.column = q.id
</pre>
<p>, but this is not always possible.</p>
<p>To work around this without changing the query parameters, one can use <code>FIND_IN_SET</code>:</p>
<pre class="brush: sql">
SELECT  *
FROM    a
WHERE   FIND_IN_SET(column, &#039;1,2,3&#039;)
</pre>
<p>This function, however, is not sargable and a full table scan will be performed on <code>a</code>.</p>
<h3 class="cracked">#1. LEFT JOIN with COUNT(*)</h3>
<p><img src="http://explainextended.com/wp-content/uploads/2010/11/MG_3971-e1288839937884.jpg" alt="" title="Nevsky" width="700" height="467" class="aligncenter size-full wp-image-5125 noborder" /></p>
<pre class="brush: sql">
SELECT  a.id, COUNT(*)
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
GROUP BY
        a.id
</pre>
<p>This query intends to count number of matches in <code>b</code> for each record in <code>a</code>.</p>
<p>The problem is that <code>COUNT(*)</code> will never return a <strong>0</strong> in such a query. If there is no match for a certain record in <code>a</code>, the record will be still returned and counted.</p>
<p><code>COUNT</code> should be made to count only the actual records in <code>b</code>. Since <code>COUNT(*)</code>, when called with an argument, ignores <code>NULL</code>s, we can pass <code>b.a</code> to it. As a join key, it can never be a null in an actual match, but will be if there were no match:</p>
<pre class="brush: sql">
SELECT  a.id, COUNT(b.a)
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
GROUP BY
        a.id
</pre>
<p><em><strong>P.S.</strong> In case you were wondering: no, the pictures don't have any special meaning. I just liked them.</em></p>
<div class='wb_fb_bottom'><div style="float:right;"></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/11/03/10-things-in-mysql-that-wont-work-as-expected/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
	</channel>
</rss>

