<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>EXPLAIN EXTENDED &#187; Oracle</title>
	<atom:link href="http://explainextended.com/category/oracle/feed/" rel="self" type="application/rss+xml" />
	<link>http://explainextended.com</link>
	<description>How to create fast database queries</description>
	<lastBuildDate>Mon, 02 Jan 2012 00:31:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Counting concurrent sessions</title>
		<link>http://explainextended.com/2010/01/25/counting-concurrent-sessions/</link>
		<comments>http://explainextended.com/2010/01/25/counting-concurrent-sessions/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:00:22 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=4032</guid>
		<description><![CDATA[Answering questions asked on the site. Steve asks: I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These [...]]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Steve</strong> asks:</p>
<blockquote><p>I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I&#8217;m trying to write daily performance reporting for the site).</p>
<p>Here&#8217;s a simplified view of my base table design:</p>
<pre class="brush: sql">
CREATE TABLE CONNECTIONS
(
USERID VARCHAR2(30),
HANDLE VARCHAR2(40),
PROCESS_DATE DATE,
COMMAND NUMBER(6,0),
COUNT NUMBER(10,0),
ELAPSED_TIME NUMBER(10,0),
NUM_OF_RECS NUMBER(10,0),
COMMAND_TIMESTAMP TIMESTAMP (6)
)
</pre>
</blockquote>
<p>The question is: at there moment each transaction started, how many other transactions were active?</p>
<p>At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between <code>command_timestamp</code> and <code>command_timestamp + elapsed_time / 86400000</code>.</p>
<p>Database <strong>B-Tree</strong> indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.</p>
<p>But these is a more simple solution.</p>
<p>Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.</p>
<p>So we just can build a table of <q>events</q>: starts and ends of the transactions, ordered chronologically. Each <q>start</q> would be denoted with a <strong>+1</strong>, and each <q>end</q> with a <strong>-1</strong>. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.</p>
<p>This can be easily done merely by calculating the partial sum of these <strong>+1</strong>&#8216;s and <strong>-1</strong>&#8216;s, which is an easy task for <strong>Oracle</strong>&#8216;s analytic functions.</p>
<p>Let&#8217;s create a sample table. I&#8217;ll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:<br />
<span id="more-4032"></span><br />
<a href="#" onclick="xcollapse('X60');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X60" style="display: none; ">
<pre class="brush: sql">
BEGIN
        DBMS_RANDOM.seed(20100125);
END;
/

CREATE TABLE t_session
        (
        id NOT NULL PRIMARY KEY,
        command_timestamp NOT NULL,
        elapsed_time NOT NULL,
        stuffing NOT NULL
        )
AS
SELECT  level,
        CAST(TO_DATE(&#039;25.01.2010&#039;, &#039;dd.mm.yyyy&#039;) - level / 1440 - DBMS_RANDOM.value / 2880 AS TIMESTAMP(6)),
        CAST(DBMS_RANDOM.value / 360 AS NUMBER(7, 5)),
        CAST(RPAD(&#039;*&#039;, 200, &#039;*&#039;) AS VARCHAR2(200))
FROM    dual
CONNECT BY
        level &lt;= 1000000
/

CREATE INDEX ix_session_commandtimestamp ON t_session (command_timestamp)
/

CREATE INDEX ix_session_end ON t_session (command_timestamp + elapsed_time)
/
</pre>
</div>
<p>For the sake of brevity, <code>elapsed_time</code> is expressed in days, not in milliseconds.</p>
<p>The table is indexed with two indexes: one on <code>command_timestamp</code> and another one on <code>command_timestamp + elapsed_time</code>.</p>
<p>We will calculate the average of the concurrent queries for the transactions started on <strong>Jan 1st, 2010</strong>.</p>
<p>To do this, we will first need to select all transactions that overlap this date. This includes all transactions whose <code>command_timestamp</code> or <code>command_timestamp + elapsed_time</code> is within this date. Both these conditions are sargable, but not at the same time. An <code>OR</code> clause here would be inefficient, since no single index can be used to filter on both conditions.</p>
<p>To work around this, we will just split the query in two parts. The first part will select all transactions that <strong>started</strong> on this date, the second part will select all transactions that <strong>ended</strong> on this date but started earlier. These two sets do not intersect, and their sum gives all transaction we are interested at. So we can just merge these two sets using <code>UNION ALL</code>:</p>
<pre class="brush: sql">
WITH    current_sessions AS
        (
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp &gt;= TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
                AND command_timestamp &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;) + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time &gt;= TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
                AND command_timestamp + elapsed_time &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;) + 1
                AND command_timestamp &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
        )
SELECT  COUNT(*), SUM(LENGTH(stuffing))
FROM    current_sessions
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>COUNT(*)</th>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">1444</td>
<td class="double_precision">288800</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0024s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  VIEW
   UNION-ALL PARTITION
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
</pre>
<p>Each part of the query used its own index and no extra effort is needed to get rid of the duplicates, so the query completes in <strong>2 ms</strong>.</p>
<p>Now, we should need to calculate the number of concurrent transactions.</p>
<p>To do this, we will duplicate the recordset we got on the previous step, adding an extra field, <code>event</code>.</p>
<p>The first copy, with <code>event = 1</code> will hold the beginnings of the transactions; the second copy with <code>event = -1</code> will hold the ends. This will give us the equal number of the records with the opposite signs, so ultimately they will add up to a zero (since all transactions get into the table only after they complete). Each transaction will therefore be split into two records.</p>
<p>These records will then be sorted by <code>event_date</code> (which corresponds to <code>command_timestamp</code> or <code>command_timestamp + elapsed_time</code> respectively, depending on the set the record belongs to).</p>
<p>Then, the partial sum will be calculated for each record, using <strong>Oracle</strong>&#8216;s <code>SUM() OVER ()</code> analytic function.</p>
<p>Since the events are ordered by their date, the value of the partial sum will hold the number of transactions open (since their opening record had already been selected and added a <strong>+1</strong> to the sum), but not yet closed (since their closing record had not yet been selected). We don&#8217;t know which transactions exactly contributed to the sum, but we are not interested in this information. All we know (and all we need to know) is the difference between the numbers of open and closed transactions.</p>
<p>On this step we have the partial sum for each record, but we need to get rid of the extra records. We are interested in the number of concurrent transactions at the moments each transaction began, so we will just filter the resultset so that it only selects the records with <code>event = 1</code>, since they correspond to the beginning of the transactions.</p>
<p>Finally, we just need to subtract <strong>1</strong> from the partial sums. This is because the transaction records contribute to the partial sums too, and we need to count the number of concurrent transactions, not the total transactions.</p>
<p>And, finally, here&#8217;s the query:</p>
<pre class="brush: sql">
WITH    current_sessions AS
        (
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp &gt;= TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
                AND command_timestamp &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;) + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time &gt;= TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
                AND command_timestamp + elapsed_time &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;) + 1
                AND command_timestamp &lt; TO_DATE(&#039;01.01.2010&#039;, &#039;dd.mm.yyyy&#039;)
        )
SELECT  AVG(concurrent), SUM(LENGTH(stuffing))
FROM    (
        SELECT  q.*,
                SUM(event) OVER (ORDER BY event_date, event DESC) - 1 AS concurrent
        FROM    (
                SELECT  1 AS event,
                        command_timestamp AS event_date,
                        cb.*
                FROM    current_sessions cb
                UNION ALL
                SELECT  -1 AS event,
                        command_timestamp + elapsed_time AS event_date,
                        ce.*
                FROM    current_sessions ce
                ) q
        )
WHERE   event = 1
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>AVG(CONCURRENT)</th>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">1,51662049861495844875346260387811634349</td>
<td class="double_precision">288800</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0691s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 TEMP TABLE TRANSFORMATION
  LOAD AS SELECT
   UNION-ALL
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
  SORT AGGREGATE
   VIEW
    WINDOW SORT
     VIEW
      UNION-ALL
       VIEW
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
       VIEW
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
</pre>
<p>The query completes in only <strong>69 ms</strong>.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2010/01/25/counting-concurrent-sessions/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2010/01/25/counting-concurrent-sessions/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/01/25/counting-concurrent-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cumulative values</title>
		<link>http://explainextended.com/2010/01/20/cumulative-values/</link>
		<comments>http://explainextended.com/2010/01/20/cumulative-values/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 20:00:19 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3995</guid>
		<description><![CDATA[From Stack Overflow: There is a table that stores signal statuses that come from different devices. SS1 and SS2 signals are inserted to table in random times If either of SS1 and SS2 signal statuses is up, then the resulting signal should be up If both SS1 and SS2 signal statuses are down, then resulting [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://stackoverflow.com/questions/2100438/can-it-be-done-with-oracle-analytic-functions"><strong>Stack Overflow</strong></a>:</p>
<blockquote><p>There is a table that stores signal statuses that come from different devices.</p>
<ul>
<li><strong>SS1</strong> and <strong>SS2</strong> signals are inserted to table in random times</li>
<li>If either of <strong>SS1</strong> and <strong>SS2</strong> signal statuses is <strong>up</strong>, then the resulting signal should be <strong>up</strong></li>
<li>If both <strong>SS1</strong> and <strong>SS2</strong> signal statuses are <strong>down</strong>, then resulting signal should be <strong>down</strong></li>
</ul>
<p>I want to prepare a query that shows the result signal status changes according to <strong>SS1</strong> and <strong>SS2</strong> signals
</p></blockquote>
<p>Each record deals with only one signal type here: either <strong>SS1</strong> or <strong>SS2</strong>. To obtain the signal statuses we should query the <em>cumulative values</em> of previous records.</p>
<p>If a record describes a change in <strong>SS2</strong>, we should query for the most recent change to <strong>SS1</strong> that had been recorded so far to obtain the <strong>SS1</strong>&#8216;s current status.</p>
<p>In systems other than <strong>Oracle</strong>, the previous value of a signal could be easily queried using subselects with <code>TOP</code> / <code>LIMIT</code> clauses. But <strong>Oracle</strong> does not support correlated queries nested more than one level deep, and limiting a subquery result to a single record (which is required by a subquery) requires it (<strong>ORDER BY</strong> should be nested). This makes constructing such a subquery in <strong>Oracle</strong> quite a pain.</p>
<p>However, in <strong>Oracle</strong>, these things can be queries using analytics functions much more efficiently.</p>
<p>Let&#8217;s create a sample table:<br />
<span id="more-3995"></span><br />
<a href="#" onclick="xcollapse('X1244');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X1244" style="display: none; ">
<pre class="brush: sql">
BEGIN
        DBMS_RANDOM.seed(20100120);
END;
/
CREATE TABLE t_signal
        (
        id NOT NULL,
        signal NOT NULL,
        status NOT NULL,
        ts NOT NULL
        )
AS
SELECT  level,
        CASE WHEN DBMS_RANDOM.value &lt; 0.5 THEN &#039;SS1&#039; ELSE &#039;SS2&#039; END,
        ROUND(DBMS_RANDOM.value),
        TO_DATE(&#039;20.01.2010&#039;, &#039;dd.mm.yyyy&#039;) - level / 86400
FROM    dual
CONNECT BY
        level &lt;= 1000000
/
ALTER TABLE t_signal ADD CONSTRAINT pk_signal_id PRIMARY KEY (id)
/
CREATE INDEX ix_signal_ts_id ON t_signal (ts, id)
/
</pre>
</div>
<p>This table contains <strong>1,000,000</strong> records filled with random states of random signals once per second.</p>
<p>Each record describes a state of a certain signal at a given moment of time. To find out the state of another signal at that moment of time we need to know the state held by the latest record for that signal.</p>
<p>To find out that state we can employ analytical function, <code>LAST_VALUE</code>.</p>
<p>With a default <code>RANGE</code> (that is <code>BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>), this function just gives the value of the expression held by the current row, and, therefore, is quite useless: it gives the same result as a plain expression would give, without any functions.</p>
<p>If <code>col</code> is unique and defines the order of the rows, then <code>LAST_VALUE(expression) OVER (ORDER BY col)</code> is just a quite expensive synonym for <code>expression</code>.</p>
<p>However, this function&#8217;s behavior can be changed by adding <code>IGNORE NULLS</code> clause. This clause makes the function return the last value so far which is not a <code>NULL</code>.</p>
<p>Now, returning the cumulative value of any signal becomes quite simple. We should just just make two expressions which substitute <code>NULL</code> instead of the signal status for the <q>wrong</q> signals. The <code>LAST_VALUE (IGNORE NULLS)</code> over these expressions will persist until rewritten by the new states of their corresponding signals.</p>
<p>Let&#8217;s check it:</p>
<pre class="brush: sql">
SELECT  *
FROM    (
        SELECT  /*+ FIRST_ROWS */
                s.*,
                DECODE(signal, &#039;SS1&#039;, status, NULL) AS exp1,
                DECODE(signal, &#039;SS2&#039;, status, NULL) AS exp2,
                LAST_VALUE(DECODE(signal, &#039;SS1&#039;, status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss1,
                LAST_VALUE(DECODE(signal, &#039;SS2&#039;, status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss2
        FROM    t_signal s
        ORDER BY
                ts, id
        ) s2
WHERE   rownum &lt;= 15
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>SIGNAL</th>
<th>STATUS</th>
<th>TS</th>
<th>EXP1</th>
<th>EXP2</th>
<th>SS1</th>
<th>SS2</th>
</tr>
<tr>
<td class="double_precision">1000000</td>
<td class="char">SS2</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:20</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999999</td>
<td class="char">SS2</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:21</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">999998</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:22</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">999997</td>
<td class="char">SS2</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:23</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision">0</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999996</td>
<td class="char">SS1</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:24</td>
<td class="double_precision">1</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999995</td>
<td class="char">SS2</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:25</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999994</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:26</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999993</td>
<td class="char">SS1</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:27</td>
<td class="double_precision">1</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999992</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:28</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999991</td>
<td class="char">SS2</td>
<td class="double_precision">1</td>
<td class="date">08.01.2010 10:13:29</td>
<td class="double_precision"></td>
<td class="double_precision">1</td>
<td class="double_precision">0</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999990</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:30</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999989</td>
<td class="char">SS2</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:31</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">999988</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:32</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">999987</td>
<td class="char">SS2</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:33</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">999986</td>
<td class="char">SS1</td>
<td class="double_precision">0</td>
<td class="date">08.01.2010 10:13:34</td>
<td class="double_precision">0</td>
<td class="double_precision"></td>
<td class="double_precision">0</td>
<td class="double_precision">0</td>
</tr>
<tr class="statusbar">
<td colspan="100">15 rows fetched in 0.0012s (0.0007s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 COUNT STOPKEY
  VIEW
   WINDOW NOSORT
    TABLE ACCESS BY INDEX ROWID, 20100120_signal.T_SIGNAL
     INDEX FULL SCAN, 20100120_signal.IX_SIGNAL_TS_ID
</pre>
<p>In the resultset above, <code>EXP1</code> and <code>EXP2</code> show the changes in the signal states of <code>SS1</code> and <code>SS2</code>, with <code>NULL</code>&#8216;s if the current record does not describe a change in the appropriate signals.</p>
<p><code>LAST_VALUE (IGNORE NULLS)</code> over these expressions show their cumulative values. These values can be used to calculate the resulting signal state.</p>
<p>Note that using <code>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> made <strong>Oracle</strong> to use <code>WINDOW NOSORT</code> and this query is instant, since the <code>LAST_VALUE</code> can be buffered.</p>
<p>To calculate the resulting signal, we should just return the <code>GREATEST</code> of <code>ss1</code> and <code>ss2</code> (returned by <code>LAST_VALUE</code>):</p>
<pre class="brush: sql">
SELECT  SUM(GREATEST(ss1, ss2))
FROM    (
        SELECT  s.*,
                DECODE(signal, &#039;SS1&#039;, status, NULL) AS exp1,
                DECODE(signal, &#039;SS2&#039;, status, NULL) AS exp2,
                LAST_VALUE(DECODE(signal, &#039;SS1&#039;, status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss1,
                LAST_VALUE(DECODE(signal, &#039;SS2&#039;, status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss2
        FROM    t_signal s
        ORDER BY
                ts, id
        ) s2
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(GREATEST(SS1,SS2))</th>
</tr>
<tr>
<td class="double_precision">750450</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0003s (3.0781s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  VIEW
   WINDOW SORT
    TABLE ACCESS FULL, 20100120_signal.T_SIGNAL
</pre>
<p>For the sake of brevity, we select the <code>SUM</code> of the resulting signal states which gives us the number of records with resulting signal up. As it should be, this number roughly amounts to <strong>75%</strong> of the total number of records.</p>
<p>The query uses a single sort (which in case of the whole table is faster than traversing the index), buffers the results and completes in <strong>3 seconds</strong> which is almost as fast as a plain query with <code>ORDER BY</code> over the same dataset would complete.</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2010/01/20/cumulative-values/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2010/01/20/cumulative-values/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2010/01/20/cumulative-values/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle: joining timestamps and time intervals</title>
		<link>http://explainextended.com/2009/12/28/oracle-joining-timestamps-and-time-intervals/</link>
		<comments>http://explainextended.com/2009/12/28/oracle-joining-timestamps-and-time-intervals/#comments</comments>
		<pubDate>Mon, 28 Dec 2009 20:00:20 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3856</guid>
		<description><![CDATA[From Stack Overflow: I am making an inner join of two tables. I have many time intervals represented by intervals and i want to get measure data from measures only within those intervals. The intervals do not overlap. Here are the table layouts: intervals entry_time exit_time measures time measure There are 1295 records in intervals [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://stackoverflow.com/questions/1947693/sql-optimizing-between-clause"><strong>Stack Overflow</strong></a>:</p>
<blockquote><p>I am making an inner join of two tables.</p>
<p>I have many time intervals represented by <code>intervals</code> and i want to get measure data from <code>measures</code> only within those intervals. The intervals do not overlap.</p>
<p>Here are the table layouts:</p>
<table class="excel">
<caption>intervals</caption>
<tr>
<th>entry_time</th>
<th>exit_time</th>
</tr>
</table>
<table class="excel">
<caption>measures</caption>
<tr>
<th>time</th>
<th>measure</th>
</tr>
</table>
<p>There are <strong>1295</strong> records in <code>intervals</code> and about a million records in <code>measures</code>. The intervals do not overlap.</p>
<p>The result I want to get is a table with in the first column the measure, then the time the measure has been done, the begin/end time of the considered interval (it would be repeated for row with a time within the considered range)</p>
<p>How can I make this faster?</p></blockquote>
<h3>Straight query</h3>
<p>The query to get the results would look like this:</p>
<pre class="brush: sql">
SELECT  measures.measure as measure,
        measures.time as time,
        intervals.entry_time as entry_time,
        intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
        time ASC
</pre>
<p>This looks quite simple. However, choosing the execution plan can be somewhat tricky task for this query.</p>
<p>The table layout gives us no hint on how many rows will be returned from the join. If all intervals begin before the time of the first measure and end after the time of the last measure, then every combination of rows will be returned from both tables and the resulting recordset will in fact be a cartesian product of two tables. If the intervals are short and sparse, few or even no rows can be returned from measures.</p>
<p>However, we know that the intervals do not overlap. This means that each measure may belong to at most one interval and the number of records in the final resultset will be no more than the number of records in <code>measures</code>.</p>
<p>Since the condition we join here is a pair of inequalities, only two methods can be used by the engine to perform the join, that is <code>NESTED LOOPS</code> and <code>MERGE JOIN</code>. <code>HASH JOIN</code> which is most efficient on large tables requires an equijoin and cannot be used for this query.</p>
<p><code>MERGE JOIN</code> sorts both resultsets on the columns it joins on and gradually advances the internal pointer on both sorted row sources so that the values of the columns being joined on in both tables always match. In case of an equijoin, the engine returns only the records holding the current value of the pointer; in case of a join on inequality condition, the engine returns all records greater (or less) than the current value from the corresponding table.</p>
<p><code>MERGE JOIN</code>, however, can only satisfy a single inequality condition while we here have two of them on two different columns. The engine should split this condition into the pair of inequalities: the first one will be used by the join; the second one will be just filtered. The query essentially turns into this one:</p>
<pre class="brush: sql">
SELECT  measures.measure as measure,
        measures.time as time,
        intervals.entry_time as entry_time,
        intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time &gt;= intervals.entry_time
WHERE   measures.time  &lt;= intervals.exit_time
ORDER BY
        time ASC
</pre>
<p><!-- --><br />
Here, the <code>MERGE JOIN</code> will be using the predicate in the <code>ON</code> clause and the filtering will use the predicate in the <code>WHERE</code> clause. The predicates are symmetrical and can easily be swapped.</p>
<p>However, the join will have to return all records to the filtering code. And with an inequality condition like the one we see above, there will be lots of records. If we take a normal situation: the interval bounds are more or less in accordance with the dates of the first and last measure and the intervals are distributed evenly, then the <code>MERGE JOIN</code> will return <code>(1265 × 1,000,000) / 2 ≈ 600,000,000</code> records, each to be filtered on the next step.</p>
<p>From performance&#8217;s point of view, this is hardly different from a cartesian join: in fact, it needs to process only as few as a half of the rows. These rows, however, need to be sorted beforehand (or taken from the index which is slow to traverse in its sort order), so this can actually even be slower than a cartesian join which does not require the rows to be in any certain order and can use an <code>INDEX FAST FULL SCAN</code> or a <code>TABLE SCAN</code>.</p>
<p>This means the only efficient way to run this query is using <code>NESTED LOOPS</code>. To benefit from this method, we should make an index on <code>measures.time</code> and convince <strong>Oracle</strong> to use it.</p>
<p>Let&#8217;s create a sample table:<br />
<span id="more-3856"></span><br />
<a href="#" onclick="xcollapse('X3845');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X3845" style="display: none; ">
<pre class="brush: sql">
BEGIN
        DBMS_RANDOM.seed(20091228);
END;
/

CREATE TABLE intervals (
        entry_time NOT NULL,
        exit_time NOT NULL
)
AS
SELECT  TO_DATE(&#039;25.12.2009&#039;, &#039;dd.mm.yyyy&#039;) - end - len,
        TO_DATE(&#039;25.12.2009&#039;, &#039;dd.mm.yyyy&#039;) - end
FROM    (
        SELECT  SUM(s) OVER (ORDER BY lvl ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS end,
                s * l AS len
        FROM    (
                SELECT  level AS lvl,
                        DBMS_RANDOM.value * 10 AS s, DBMS_RANDOM.value AS l
                FROM    dual
                CONNECT BY
                        level &lt;= 1500
                )
        )
/

CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time)
/

CREATE UNIQUE INDEX ux_intervals_exit ON intervals (exit_time)
/

CREATE TABLE measures (
        time NOT NULL,
        measure NOT NULL,
        stuffing NOT NULL
)
AS
SELECT  TO_DATE(&#039;28.12.2009&#039;, &#039;dd.mm.yyyy&#039;) - level / 144,
        CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18, 2)),
        CAST(RPAD(&#039;*&#039;, 200, &#039;*&#039;) AS VARCHAR2(200))
FROM    dual
CONNECT BY
        level &lt;= 1080000
/

ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time)
/

CREATE INDEX ix_measures_time_measure ON measures (time, measure)
/
</pre>
</div>
<p>There are <strong>1,500</strong> non-overlapping intervals with random lengths from <strong>1</strong> to <strong>10</strong> days, and <strong>1,080,000</strong> measures. The overall spans of the intervals and the measures roughly match. <code>entry_time</code> and <code>exit_time</code> in <code>intervals</code> are indexed with <code>UNIQUE</code> indexes, and there is a composite index on <code>measures (time, measure)</code>. <code>measures</code> also contains a <code>stuffing</code> column which is a <code>VARCHAR2(200)</code> filled with asterisks. This emulates actual tables containing lots of data.</p>
<p>Here&#8217;s a query that selects all measures within the corresponding intervals and finds the total length of their <code>stuffing</code> columns:</p>
<pre class="brush: sql">
SELECT  /*+ ORDERED USE_NL (i, m) */
        SUM(LENGTH(stuffing))
FROM    intervals i
JOIN    measures m
ON      m.time BETWEEN entry_time AND exit_time
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">103878600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (10.3593s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  TABLE ACCESS BY INDEX ROWID, 20091228_intervals.MEASURES
   NESTED LOOPS
    TABLE ACCESS FULL, 20091228_intervals.INTERVALS
    INDEX RANGE SCAN, 20091228_intervals.PK_MEASURES_TIME
</pre>
<p>This takes a little more than <strong>10</strong> seconds.</p>
<p>The same query using <code>MERGE JOIN</code> ran for more than <strong>10</strong> minutes on my test machine and I had to interrupt it after that.</p>
<p>So the <code>NESTED LOOPS</code> is most efficient algorithm that <strong>Oracle</strong> can use for this query and it completes in <strong>10.35</strong> seconds.</p>
<h3>Interval hashing</h3>
<p><code>NESTED LOOPS</code> algorithm is quite efficient for a query like this, but requires traversing the index in a loop which results in a <code>TABLE ACCESS BY INDEX ROWID</code> that is needed to find the values of <code>stuffing</code>. In this very case a <code>HASH JOIN</code> with a <code>FULL TABLE SCAN</code> would be more efficient.</p>
<p><code>HASH JOIN</code> requires an equality condition so we need to provide it somehow. This can be done by using a technique called <q>interval hashing</q>.</p>
<p>Here&#8217;s how it works:</p>
<h4>Splitting the time axis</h4>
<p>The time axis is mapped into a set of ranges each being assigned with an ordinal number. The mapping function applied to a timestamp should uniquely define the range it belongs to. Usually this is done by splitting the time axis to the ranges of equal length and taking the integer part of the difference between the timestamp and the beginning of the first range divided by the range length:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/12/axis.png" alt="" title="Axis" width="600" height="60" class="aligncenter size-full wp-image-3865 noborder" /></p>
<p>Here&#8217;s an expression to do this:</p>
<pre class="brush: sql">
TRUNC((timestamp - TO_DATE(1, &#039;J&#039;)) * 2)
</pre>
<p>This expression splits the time axis into a set ranges each being <strong>12</strong> hours long.</p>
<h4>Finding the ranges</h4>
<p>For each interval, all ranges it overlaps should be found:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/12/ranges.png" alt="" title="Ranges" width="600" height="300" class="aligncenter size-full wp-image-3868 noborder" /></p>
<p>This can be easily done by mapping the beginning and the end of each interval to their corresponding ranges and finding all the ranges between them.</p>
<p>This query returns the first range, the last range and total number of ranges each interval overlaps:</p>
<pre class="brush: sql">
SELECT  entry_range, exit_range,
        exit_range - entry_range + 1 AS range_span,
        entry_time, exit_time
FROM    (
        SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                entry_time,
                exit_time
        FROM    intervals
        )
</pre>
<h4>Exploding the ranges</h4>
<p>We should <q>explode</q> the ranges: if an interval spans <code>n</code> ranges, we should generate a recordset of <code>n</code> records for this interval, each corresponding to an individual range.</p>
<p>This would be an easy task if <strong>Oracle</strong> did support <code>generate_series</code> (like <strong>PostgreSQL</strong>) or <code>CROSS APPLY</code> (like <strong>SQL Server</strong>). Unfortunately, it does support neither of these, so we will have to make do with a simple join.</p>
<p>We should do the following:</p>
<ul>
<li>
<p>Find the longest interval in terms of ranges it overlaps:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/12/longest.png" alt="" title="Longest" width="270" height="210" class="aligncenter size-full wp-image-3873 noborder" /></p>
<p>On the picture above, the longest interval spans <strong>3</strong> ranges.</p>
<p>Here&#8217;s a query to find the interval that spans most ranges:</p>
<pre class="brush: sql">
WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        )
SELECT  *
FROM    upper
</pre>
<p>I added the <code>MATERIALIZE</code> hint so that the <code>CTE</code> results are stored in the temporary tablespace and not reevaluated each time the <code>CTE</code> is called. This will be used later.</p>
</li>
<li>
<p>Generate the dummy rowset of as many records as there are the ranges the longest interval overlaps:</p>
<pre class="brush: sql">
WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),
        ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level &lt;= max_range
        )
SELECT  *
FROM    ranges
</pre>
</li>
<li>
<p>Join the intervals to this rowset, generating one record per each range an interval overlaps:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/12/join.png" alt="" title="Join" width="600" height="540" class="aligncenter size-full wp-image-3881 noborder" /></p>
<p>This is done using this query:</p>
<pre class="brush: sql">
WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),
        ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level &lt;= max_range
        ),
        tiles AS
        (
        SELECT  /*+ MATERIALIZE USE_MERGE (r s) */
                entry_range + chunk - 1 AS tile,
                entry_time,
                exit_time
        FROM    ranges r
        JOIN    splits s
        ON      chunk &lt;= range_span
        )
SELECT  *
FROM    tiles
</pre>
<p>Note that I added the hint <code>USE_MERGE</code> to the query that does the actual join. Since this is a pure inequality condition, without extra filtering, a <code>MERGE JOIN</code> is actually a best plan here.</p>
<p>This query returns one record for each time an interval overlaps a range. Not that though the intervals themselves do not overlap, two intervals can overlap the same range and it will be returned twice in this case.</p>
</li>
<li>
<p>Now we have a recordset with the range numbers.</p>
</li>
</ul>
<h4>Joining on the ranges</h4>
<p>When we have the intervals mapped into the ranges we can do just the same with the measures: map the timestamps into the ranges and join on them.</p>
<p>Since this would be an equijoin, a <code>HASH JOIN</code> method can be used:</p>
<p><a href="#" onclick="xcollapse('X9903');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X9903" style="display: none; ">
<pre class="brush: sql">
WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),
        ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level &lt;= max_range
        ),
        tiles AS
        (
        SELECT  /*+ MATERIALIZE USE_MERGE (r s) */
                entry_range + chunk - 1 AS tile,
                entry_time,
                exit_time
        FROM    ranges r
        JOIN    splits s
        ON      chunk &lt;= range_span
        )
SELECT  /*+ LEADING(t) USE_HASH(m t) */
        SUM(LENGTH(stuffing))
FROM    tiles t
JOIN    measures m
ON      TRUNC((m.time - TO_DATE(1, &#039;J&#039;)) * 2) = tile
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">124948800</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (2.6250s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 TEMP TABLE TRANSFORMATION
  LOAD AS SELECT , 20091228_intervals.MEASURES
   TABLE ACCESS FULL, 20091228_intervals.INTERVALS
  LOAD AS SELECT , 20091228_intervals.MEASURES
   SORT AGGREGATE
    VIEW
     TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67D5_1825AC8
  LOAD AS SELECT , 20091228_intervals.MEASURES
   CONNECT BY WITHOUT FILTERING
    COUNT
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67D6_1825AC8
  LOAD AS SELECT , 20091228_intervals.MEASURES
   MERGE JOIN
    SORT JOIN
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67D7_1825AC8
    SORT JOIN
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67D5_1825AC8
  SORT AGGREGATE
   HASH JOIN
    VIEW
     TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67D8_1825AC8
    TABLE ACCESS FULL, 20091228_intervals.MEASURES
</pre>
</div>
<p>We see that this query is almost <strong>5</strong> times as fast. Unfortunately, it returns incorrect results.</p>
<p>This is because joining on the ranges is not enough. The fact that both an interval and a measure overlap each other means they overlap the same range (and hence the match will be satisfied by the join). But the opposite is not true: even if an interval and a measure overlap the same range, they do not necessarily overlap each other.</p>
<h4>Adding fine filtering</h4>
<p>This problem however, can be fixed by adding a simple condition to the <code>WHERE</code> clause of the query: it will be exactly the same condition for checking the intervals bounds against the measure time that we originally used.</p>
<p>Since the query returns everything we need (plus some excess data we don&#8217;t need), we just need to filter the incorrect matches out.</p>
<p>This is essentially what the original query did, but instead of making <code>N×M</code> comparisons required by the cartesian join or half that number required by the <code>MERGE JOIN</code>, the comparisons will only need to be made inside a single range. If we divide the time axis into <code>R</code> ranges, then <code>R</code> ranges should be examined, with <code>(N/R)×(M/R)</code> comparisons within each range. This means that the total number of comparisons will be <code>(N×M)/R</code> which of course is much more efficient.</p>
<p>Here&#8217;s the final query:</p>
<pre class="brush: sql">
WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, &#039;J&#039;)) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, &#039;J&#039;)) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),
        ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level &lt;= max_range
        ),
        tiles AS
        (
        SELECT  /*+ MATERIALIZE USE_MERGE (r s) */
                entry_range + chunk - 1 AS tile,
                entry_time,
                exit_time
        FROM    ranges r
        JOIN    splits s
        ON      chunk &lt;= range_span
        )
SELECT  /*+ LEADING(t) USE_HASH(m t) */
        SUM(LENGTH(stuffing))
FROM    tiles t
JOIN    measures m
ON      TRUNC((m.time - TO_DATE(1, &#039;J&#039;)) * 2) = tile
        AND m.time BETWEEN t.entry_time AND t.exit_time
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">103878600</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (2.5312s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 TEMP TABLE TRANSFORMATION
  LOAD AS SELECT , 20091228_intervals.MEASURES
   TABLE ACCESS FULL, 20091228_intervals.INTERVALS
  LOAD AS SELECT , 20091228_intervals.MEASURES
   SORT AGGREGATE
    VIEW
     TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67DD_1825AC8
  LOAD AS SELECT , 20091228_intervals.MEASURES
   CONNECT BY WITHOUT FILTERING
    COUNT
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67DE_1825AC8
  LOAD AS SELECT , 20091228_intervals.MEASURES
   MERGE JOIN
    SORT JOIN
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67DF_1825AC8
    SORT JOIN
     VIEW
      TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67DD_1825AC8
  SORT AGGREGATE
   HASH JOIN
    VIEW
     TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D67E0_1825AC8
    TABLE ACCESS FULL, 20091228_intervals.MEASURES
</pre>
<p>This query completes in only <strong>2.53</strong> seconds.</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/12/28/oracle-joining-timestamps-and-time-intervals/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/12/28/oracle-joining-timestamps-and-time-intervals/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/12/28/oracle-joining-timestamps-and-time-intervals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle: Selecting records holding group-wise maximum</title>
		<link>http://explainextended.com/2009/12/16/oracle-selecting-records-holding-group-wise-maximum/</link>
		<comments>http://explainextended.com/2009/12/16/oracle-selecting-records-holding-group-wise-maximum/#comments</comments>
		<pubDate>Wed, 16 Dec 2009 20:00:37 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3845</guid>
		<description><![CDATA[It was quite a time since I posted my last article, and I have a bunch of unanswered questions that people asked me. Sorry for not posting and answering for so long, had a few urgent things to do, and hope these breaks won&#8217;t be so long in the future :) Now, continuing the series [...]]]></description>
			<content:encoded><![CDATA[<p>It was quite a time since I posted my last article, and I have a bunch of unanswered questions that people asked me.</p>
<p>Sorry for not posting and answering for so long, had a few urgent things to do, and hope these breaks won&#8217;t be so long in the future :)</p>
<p>Now, continuing the series on <a href="/2009/11/24/mysql-selecting-records-holding-group-wise-maximum-on-a-unique-column/">selecting records holding group-wise maximums</a>:</p>
<blockquote><p>How do I select the <em>whole</em> records, grouped on <code>grouper</code> and holding a group-wise maximum (or minimum) on other column?</p></blockquote>
<p>Finally, <strong>Oracle</strong>.</p>
<p>We will create the very same table as we did for the previous systems:<br />
<span id="more-3845"></span><br />
<a href="#" onclick="xcollapse('X6897');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X6897" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_distinct
        (
        id NOT NULL,
        orderer NOT NULL,
        glow NOT NULL,
        ghigh NOT NULL,
        stuffing NOT NULL
        )
AS
SELECT  level,
        TRUNC(DBMS_RANDOM.value * 10) + 1,
        MOD(level - 1, 10) + 1,
        MOD(level - 1, 10000) + 1,
        CAST(RPAD(&#039;*&#039;, 200, &#039;*&#039;) AS VARCHAR2(200))
FROM    dual
CONNECT BY
        level &lt;= 1000000
/
ALTER TABLE t_distinct ADD CONSTRAINT pk_distinct_id PRIMARY KEY (id)
/
CREATE INDEX ix_distinct_glow_id ON t_distinct (glow, id)
/
CREATE INDEX ix_distinct_ghigh_id ON t_distinct (ghigh, id)
/
CREATE INDEX ix_distinct_glow_orderer_id ON t_distinct (glow, orderer, id)
/
CREATE INDEX ix_distinct_ghigh_orderer_id ON t_distinct (ghigh, orderer, id)
/
</pre>
</div>
<p>There are <strong>1,000,000</strong> records with the following fields:</p>
<ul>
<li><code>id</code> is the <code>PRIMARY KEY</code></li>
<li><code>orderer</code> is filled with random values from <strong>1</strong> to <strong>10</strong></li>
<li><code>glow</code> is a low cardinality grouping field (<strong>10</strong> distinct values)</li>
<li><code>ghigh</code> is a high cardinality grouping field (<strong>10,000</strong> distinct values)</li>
<li><code>stuffing</code> is an asterisk-filled <code>VARCHAR(200)</code> column added to emulate payload of the actual tables</li>
</ul>
<h3>Analytic functions</h3>
<p>Of course <strong>Oracle</strong> support analytic functions just like <strong>SQL Server</strong> and <strong>PostgreSQL</strong> (and actually, that&#8217;s <strong>SQL Server</strong> and <strong>PostgreSQL</strong> that support analytics functions like <strong>Oracle</strong>).</p>
<p>To select records holding the group-wise maximums, we can apply the very same <code>ROW_NUMBER</code> and <code>DENSE_RANK</code> solutions we used for <strong>SQL Server</strong> and <strong>PostgreSQL</strong> before:</p>
<pre class="brush: sql">
SELECT  id, orderer, glow, ghigh
FROM    (
        SELECT  d.*, ROW_NUMBER() OVER (PARTITION BY glow ORDER BY id) AS rn
        FROM    t_distinct d
        ) q
WHERE   rn = 1
</pre>
<p><a href="#" onclick="xcollapse('X8809');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X8809" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>ORDERER</th>
<th>GLOW</th>
<th>GHIGH</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">4</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">2</td>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">5</td>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">1</td>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">7</td>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">8</td>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">1</td>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">10</td>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">6</td>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0005s (6.9528s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 VIEW
  WINDOW SORT PUSHED RANK
   TABLE ACCESS FULL, 20091216_groupwise.T_DISTINCT
</pre>
</div>
<pre class="brush: sql">
SELECT  COUNT(*) AS cnt, SUM(LENGTH(stuffing)) AS psum
FROM    (
        SELECT  d.*, DENSE_RANK() OVER (PARTITION BY glow ORDER BY orderer) AS dr
        FROM    t_distinct d
        ) dd
WHERE   dr = 1
</pre>
<p><a href="#" onclick="xcollapse('X6117');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X6117" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>CNT</th>
<th>PSUM</th>
</tr>
<tr>
<td class="double_precision">100412</td>
<td class="double_precision">20082400</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (6.7498s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  VIEW
   WINDOW SORT PUSHED RANK
    TABLE ACCESS FULL, 20091216_groupwise.T_DISTINCT
</pre>
</div>
<p>This takes a little more than <strong>6</strong> seconds.</p>
<h3><code>CONNECT BY</code> and subqueries</h3>
<p>Unfortunately in <strong>Oracle</strong> we cannot use the decisions we used to improve the query time in the other systems.</p>
<p><strong>Oracle</strong> does not support loose index scan on composite indexes like <strong>MySQL</strong>.</p>
<p>It does support a similar solution, <code>INDEX SKIP SCAN</code>, but can use it only for the queries that search for the exact value of the <code>orderer</code>, not for <code>MIN</code> or <code>MAX</code>.</p>
<p><strong>Oracle</strong> does not support <code>CROSS APPLY</code> so we cannot apply a multirecord or a multicolumn query to each result of the rowset.</p>
<p>More than that: <strong>Oracle</strong> does not event support deep correlation of the subqueries. You cannot nest a reference to the outer table in the subquery, and using <code>ROW_NUMBER</code> or <code>ROWNUM</code> to filter the first result requires nesting. So making an analog of <strong>MySQL</strong>&#8216;s <code>LIMIT</code> in a subquery won&#8217;t work either.</p>
<p><strong>Oracle 10g</strong> (which is what I&#8217;m writing about here) also does not support recursive <strong>CTE</strong>&#8216;s, so it&#8217;s not possible to use this method too.</p>
<p>However, there still is a way to improve the query.</p>
<p>Instead of jumping over the index values to build the list of distinct groupers, we can build a whole list of the possible groupers and join it with the original table.</p>
<p>The <q>possible</q> groupers are or course those between <code>MIN</code> and <code>MAX</code>.</p>
<p>To build such a list we can use a <code>CONNECT BY</code> query over the <code>dual</code> table, (which is a standard way to generate random rowsets in <strong>Oracle</strong>), and just take <code>MIN(glow)</code> and <code>MAX(glow)</code> as the start and stop conditions.</p>
<p>We then can find the <code>MIN(orderer)</code> or <code>MIN(id)</code> for each grouper from the generated list and join the table back on this value. This will use an efficient <code>INDEX RANGE SCAN (MIN / MAX)</code>. There can be gaps: not all groupers will be present in the list, but this is not a problem, since the subquery will return a <code>NULL</code> and the join will just yield no rows.</p>
<p>Here&#8217;s the query to find the records holding the <code>MIN(id)</code>:</p>
<pre class="brush: sql">
WITH    q AS
        (
        SELECT  level - 1 +
                (
                SELECT  MIN(glow)
                FROM    t_distinct
                ) AS d
        FROM    dual
        CONNECT BY
                level &lt;=
                (
                SELECT  MAX(glow)
                FROM    t_distinct
                ) -
                (
                SELECT  MIN(glow)
                FROM    t_distinct
                ) + 1
        )
SELECT  d.id, d.orderer, d.glow, d.glow
FROM    q
JOIN    t_distinct d
ON      d.id =
        (
        SELECT  MAX(id)
        FROM    t_distinct di
        WHERE   di.glow = q.d
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>ORDERER</th>
<th>GLOW</th>
<th>GLOW</th>
</tr>
<tr>
<td class="double_precision">999991</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999992</td>
<td class="double_precision">6</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">999993</td>
<td class="double_precision">8</td>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">999994</td>
<td class="double_precision">5</td>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">999995</td>
<td class="double_precision">10</td>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">999996</td>
<td class="double_precision">9</td>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">999997</td>
<td class="double_precision">4</td>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">999998</td>
<td class="double_precision">3</td>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">999999</td>
<td class="double_precision">1</td>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">1000000</td>
<td class="double_precision">4</td>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0005s (0.0009s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  VIEW
   CONNECT BY WITHOUT FILTERING
    FAST DUAL
  TABLE ACCESS BY INDEX ROWID, 20091216_groupwise.T_DISTINCT
   INDEX UNIQUE SCAN, 20091216_groupwise.PK_DISTINCT_ID
    SORT AGGREGATE
     FIRST ROW
      INDEX RANGE SCAN (MIN/MAX), 20091216_groupwise.IX_DISTINCT_GLOW_ID
</pre>
<p>, and this is a query to find the records holding <code>MAX(id)</code> within the <code>MIN(orderer)</code>:</p>
<pre class="brush: sql">
WITH    q AS
        (
        SELECT  level - 1 +
                (
                SELECT  MIN(glow)
                FROM    t_distinct
                ) AS d
        FROM    dual
        CONNECT BY
                level &lt;=
                (
                SELECT  MAX(glow)
                FROM    t_distinct
                ) -
                (
                SELECT  MIN(glow)
                FROM    t_distinct
                ) + 1
        )
SELECT  d.id, d.orderer, d.glow, d.glow
FROM    (
        SELECT  d,
                (
                SELECT  MIN(orderer)
                FROM    t_distinct d1
                WHERE   d1.glow = q.d
                ) AS mo
        FROM    q
        ) q2
JOIN    t_distinct d
ON      d.id =
        (
        SELECT  MAX(id)
        FROM    t_distinct d2
        WHERE   d2.glow = q2.d
                AND d2.orderer = q2.mo
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>ORDERER</th>
<th>GLOW</th>
<th>GLOW</th>
</tr>
<tr>
<td class="double_precision">999991</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">999852</td>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">999543</td>
<td class="double_precision">1</td>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">999974</td>
<td class="double_precision">1</td>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">999825</td>
<td class="double_precision">1</td>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">999816</td>
<td class="double_precision">1</td>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">999707</td>
<td class="double_precision">1</td>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">999818</td>
<td class="double_precision">1</td>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">999999</td>
<td class="double_precision">1</td>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">999960</td>
<td class="double_precision">1</td>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0005s (0.0010s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  VIEW
   VIEW
    CONNECT BY WITHOUT FILTERING
     FAST DUAL
  TABLE ACCESS BY INDEX ROWID, 20091216_groupwise.T_DISTINCT
   INDEX UNIQUE SCAN, 20091216_groupwise.PK_DISTINCT_ID
    SORT AGGREGATE
     FIRST ROW
      INDEX RANGE SCAN (MIN/MAX), 20091216_groupwise.IX_DISTINCT_GLOW_ORDERER_ID
</pre>
<p>The latter query is a little bit more complex since we need two subqueries here: one to find the <code>MIN(orderer)</code>, another one to find the <code>MAX(id)</code>. The principle is the same, however.</p>
<p>Both subqueries are very fast and complete within <strong>1</strong> millisecond (read instantly).</p>
<p>This query depends not only on the selectivity of the grouper but also on the actual values of the bounds (the higher is the difference between <code>MIN</code> and <code>MAX</code> on the groupers, the less efficient the query is).</p>
<p>For this solution to work, the groupers should have a well-ordered datatype. This means you can build a list of all possible values between <code>MIN</code> and <code>MAX</code>. Integers and dates (without time portion) are well-ordered; real numbers, strings and timestamps are not.</p>
<p>For this solution to be efficient, the groupers should be well-ordered, dense and have low cardinality. Though these limitations can be quite strong, the solutions is still extremely useful for very many many real-world scenarios which groups by integers (like categories in the blogging system or department codes in an accounting application); or dates without time portion (like sales reports).</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/12/16/oracle-selecting-records-holding-group-wise-maximum/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/12/16/oracle-selecting-records-holding-group-wise-maximum/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/12/16/oracle-selecting-records-holding-group-wise-maximum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle: nested SUM</title>
		<link>http://explainextended.com/2009/10/08/oracle-nested-sum/</link>
		<comments>http://explainextended.com/2009/10/08/oracle-nested-sum/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 19:00:46 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3380</guid>
		<description><![CDATA[From Stack Overflow: Suppose I have this table A B C D Datatypes are not important. I want to do this: SELECT a AS a1, b AS b1, c AS c1, ( SELECT SUM(d) FROM source WHERE a = a1 AND b = b1 ) AS total FROM source GROUP BY a, b, c , [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://stackoverflow.com/questions/1538341/select-sum-as-field"><strong>Stack Overflow</strong></a>:</p>
<blockquote><p>Suppose I have this table</p>
<table class="excel">
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
</tr>
</table>
<p>Datatypes are not important.</p>
<p>I want to do this:</p>
<pre class="brush: sql">
SELECT  a AS a1, b AS b1, c AS c1,
        (
        SELECT  SUM(d)
        FROM    source
        WHERE   a = a1
                AND b = b1
        ) AS total
FROM    source
GROUP BY
        a, b, c
</pre>
<p>, but I can&#8217;t find a way (<strong>SQL Developer</strong> keeps complaining with <q><code>FROM</code> clause not found</q>.)</p>
<p>Is there a way? Is it possible?
</p></blockquote>
<p>This is of course possible if we just alias the query and prepend the alias to the field:</p>
<pre class="brush: sql">
SELECT  a, b, c,
        (
        SELECT  SUM(d)
        FROM    source si
        WHERE   si.a = so.a
                AND si.b = so.b
        CONNECT BY
                16 &gt;= level
        )
FROM    source so
GROUP BY
        a, b, c
</pre>
<p>This works well on this sample set of data:</p>
<pre class="brush: sql">
WITH    source AS
        (
        SELECT  FLOOR(MOD(level - 1, 8) / 4) + 1 AS a,
                FLOOR(MOD(level - 1, 4) / 2) + 1 AS b,
                MOD(level - 1, 2) + 1 AS c,
                level AS d
        FROM    dual
        CONNECT BY
                16 &gt;= level
        )
SELECT  a, b, c,
        (
        SELECT  SUM(d)
        FROM    source si
        WHERE   si.a = so.a
                AND si.b = so.b
        )
FROM    source so
GROUP BY
        a, b, c
</pre>
<p>&nbsp;<br />
<a href="#" onclick="xcollapse('X6348');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X6348" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>(SELECTSUM(D)FROMSOURCESIWHERESI.A=SO.AANDSI.B=SO.B)</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">30</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">38</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">22</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">38</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">46</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">46</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">22</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">30</td>
</tr>
</table>
</div>
</div>
<p>, but it needs to reevaluate the subquery for each group.</p>
<p>In <strong>Oracle</strong>, there is a better way to do this query: nesting an aggregate <code>SUM</code> inside an analytical <code>SUM</code>.<br />
<span id="more-3380"></span><br />
What this query does is in fact the following:</p>
<ul>
<li>Calculate the groupwise <code>SUM(d)</code> for each <code>(a, b)</code></li>
<li>Retrieve the distinct values of <code>a</code>, <code>b</code>, <code>c</code></li>
<li>For each <code>(a, b, c)</code>, return the <code>SUM(d)</code> for the corresponding <code>(a, b)</code></li>
</ul>
<p>This means that <code>total</code> will be the same for each <code>(a, b)</code>, thous the values of <code>c</code> may differ.</p>
<p>However, the <code>SUM</code> is associative, that is sum of groupwise sums is the same as the sum of separate values.</p>
<p>To calculate the <code>SUM(d)</code> for a given <code>(a, b)</code> we can calculate the partial sums for <code>(a, b, c)</code> and add them together. Then we need return this value for each <code>(a, b)</code> (which can be multiple).</p>
<p>To return an aggregate value along with each of the records that contribute to it, the analytical functions are used (those with <code>OVER</code> clause).</p>
<p>It is widely known that analytical functions, as well as aggregate functions, cannot be nested.</p>
<p>However, one can nest an aggregate function within an analytical one. This is possible because aggregate functions work over the source-level values before <code>GROUP BY</code>, while analytical ones work with <code>SELECT</code>-level expressions (after the <code>GROUP BY</code>)</p>
<p>Here&#8217;s the query to do this:</p>
<pre class="brush: sql">
WITH    source AS
        (
        SELECT  FLOOR(MOD(level - 1, 8) / 4) + 1 AS a,
                FLOOR(MOD(level - 1, 4) / 2) + 1 AS b,
                MOD(level - 1, 2) + 1 AS c,
                level AS d
        FROM    dual
        CONNECT BY
                level &lt;= 16
        )
SELECT  a, b, c, SUM(SUM(d)) OVER (PARTITION BY a, b)
FROM    source
GROUP BY
        a, b, c
</pre>
<p><a href="#" onclick="xcollapse('X10959');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X10959" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>SUM(SUM(D))OVER(PARTITIONBYA,B)</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">22</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">22</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">30</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">30</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
<td class="double_precision">38</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">2</td>
<td class="double_precision">38</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">1</td>
<td class="double_precision">46</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
<td class="double_precision">46</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 WINDOW BUFFER
  SORT GROUP BY
   VIEW
    CONNECT BY WITHOUT FILTERING
     FAST DUAL
</pre>
</div>
<p>This query returns the same results.</p>
<p>The innermost <code>SUM</code>s are calculated <code>(a, b, c)</code>-wise (like in <code>GROUP BY</code>), while the outermost ones are calculated <code>(a, b)</code>-wise (like in <code>PARTITION BY</code>).</p>
<p>Since ordering by <code>(a, b, c)</code> implies ordering by <code>(a, b)</code>, both these groupings are performed with a single <code>SORT</code>.</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/10/08/oracle-nested-sum/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/10/08/oracle-nested-sum/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/10/08/oracle-nested-sum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IN vs. JOIN vs. EXISTS: Oracle</title>
		<link>http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/</link>
		<comments>http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 19:00:16 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3300</guid>
		<description><![CDATA[Answering questions asked on the site. Jason asks: I have a co-worker who swears that Oracle IN queries are slow and refuses to use them. For example: SELECT foo FROM bar WHERE bar.stuff IN ( SELECT stuff FROM asdf ) Typically, this kind of performance advice strikes me as overgeneralized and my instinct is to [...]]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Jason</strong> asks:</p>
<blockquote><p>I have a co-worker who swears that <strong>Oracle</strong> <code>IN</code> queries are slow and refuses to use them.  For example:</p>
<pre class="brush: sql">
SELECT  foo
FROM    bar
WHERE   bar.stuff IN
        (
        SELECT  stuff
        FROM    asdf
        )
</pre>
<p>Typically, this kind of performance advice strikes me as overgeneralized and my instinct is to ignore it.  But I figure I&#8217;ll give him the benefit of the doubt and ask about it.</p>
<p>So, in general is an <code>IN</code> query very expensive?  I&#8217;m having trouble putting together many non-trivial queries to run on <code>EXPLAIN PLAN</code>.</p></blockquote>
<p>Does the <code>IN</code> predicate always have inferior efficiency compared to it&#8217;s counterparts, <code>EXISTS</code> and <code>JOIN</code>?</p>
<p>Let&#8217;s check.</p>
<p>First of all, there are at least three ways to check each row in a table against a list of (possible duplicate) values and return each row at most once:</p>
<ul>
<li>
<p><code>IN</code> predicate:</p>
<pre class="brush: sql">
SELECT  foo
FROM    bar
WHERE   bar.stuff IN
        (
        SELECT  stuff
        FROM    asdf
        )
</pre>
</li>
<li>
<p><code>EXISTS</code> predicate:</p>
<pre class="brush: sql">
SELECT  foo
FROM    bar
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    asdf
        WHERE   asdf.stuff = bar.stuff
        )
</pre>
</li>
<li>
<p><code>JOIN</code> / <code>DISTINCT</code>:</p>
<pre class="brush: sql">
SELECT  b.foo
FROM    (
        SELECT  DISTINCT stuff
        FROM    asdf
        ) a
JOIN    bar b
ON      b.stuff = a.stuff
</pre>
</li>
</ul>
<p>All these queries are semantically the same.</p>
<p>Common wisdom advices against using <code>IN</code> predicates because it just doesn&#8217;t look a right thing to do.</p>
<p>This is because less experienced <strong>SQL</strong> developers tend to translate the <strong>SQL</strong> statements into pseudocode just like they see it, which in this case looks something like this:</p>
<pre class="brush: php">
foreach ($bar as $bar_record) {
    foreach ($asdf as $asdf_record) {
        if ($bar_record-&gt;stuff == $asdf_record-&gt;stuff)
            output ($bar_record);
            break;
    }
}
</pre>
<p>This just <em>radiates</em> inefficiency, since the inner rowset should be iterated for each row returned.</p>
<p><code>EXISTS</code> looks more nice since it at least gives a hint of possibility to use an index. However, it still looks like the same nested loops.</p>
<p>Finally, the <code>JOIN</code> looks the most promising here since everybody knows joins are optimized (though few know how exactly). But an <code>IN</code>? No thanks, they say, it will execute once for every row, it&#8217;s slow, it&#8217;s bad, it&#8217;s inefficient!</p>
<p>People, come on. <strong>Oracle</strong> developers have thought it over ages ago and guess what: they turned out to be smart enough to implement an efficient algorithm for an <code>IN</code> construct.</p>
<p>Now, let&#8217;s make two sample tables and see how it&#8217;s done:<br />
<span id="more-3300"></span><br />
<a href="#" onclick="xcollapse('X1443');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X1443" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_outer (id NOT NULL, value NOT NULL, stuffing NOT NULL)
AS
SELECT  level, level, CAST(RPAD(&#039;*&#039;, 100, &#039;*&#039;) AS VARCHAR2(100))
FROM    dual
CONNECT BY
        level &lt;= 10
/

CREATE TABLE t_inner (id NOT NULL, pvalue NOT NULL, uvalue NOT NULL, stuffing NOT NULL)
AS
SELECT  level, FLOOR((level - 1) / 2) + 1, level, CAST(RPAD(&#039;*&#039;, 100, &#039;*&#039;) AS VARCHAR2(100))
FROM    dual
CONNECT BY
        level &lt;= 1000000
/

ALTER TABLE t_outer ADD CONSTRAINT pk_outer_id PRIMARY KEY (id)
/
ALTER TABLE t_inner ADD CONSTRAINT pk_inner_id PRIMARY KEY (id)
/
CREATE UNIQUE INDEX ux_inner_uvalue ON t_inner (uvalue)
/
</pre>
</div>
<p>Table <code>t_outer</code> contains <strong>10</strong> records.</p>
<p>Table <code>t_inner</code> contains <strong>1,000,000</strong> records. The field <code>uvalue</code> has a <code>UNIQUE</code> index defined on it, while the field <code>pvalue</code> has no index at all, and contains duplicates (each value from <strong>1</strong> to <strong>500,000</strong> is stored twice).</p>
<p>Now, let&#8217;s make a couple of queries.</p>
<h3>IN on UNIQUE column</h3>
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer
WHERE   value IN
        (
        SELECT  uvalue
        FROM    t_inner
        )
</pre>
<p><a href="#" onclick="xcollapse('X315');return false;">View query details</a><br />
</p>
<div id="X315" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0007s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  TABLE ACCESS FULL, 20090930_in.T_OUTER
  INDEX UNIQUE SCAN, 20090930_in.UX_INNER_UVALUE
</pre>
</div>
<p>This query returns all <strong>10</strong> values from the <code>t_outer</code> instantly.</p>
<p>If we look into the query plan we will see that this is just a plain <code>NESTED LOOPS</code> join on the index. No whole subquery reevaluation, the index is used and used efficiently.</p>
<p><strong>Oracle</strong> is smart enough to make three logical constructs:</p>
<ol>
<li><code>IN</code> is equivalent to a <code>JOIN</code> / <code>DISTINCT</code></li>
<li><code>DISTINCT</code> on a column marked as <code>UNIQUE</code> and <code>NOT NULL</code> is redundant, so the <code>IN</code> is equivalent to a simple <code>JOIN</code>
</li>
<li><code>IN</code> is equivalent to a simple <code>JOIN</code> so any valid join method and the access methods can be used</li>
</ol>
<p>This makes this query super fast.</p>
<h3>JOIN / DISTINCT on UNIQUE column</h3>
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer o
JOIN    (
        SELECT  DISTINCT uvalue
        FROM    t_inner
        )
ON      o.value = uvalue
</pre>
<p><a href="#" onclick="xcollapse('X7719');return false;">View query details</a><br />
</p>
<div id="X7719" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0007s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 VIEW
  NESTED LOOPS
   TABLE ACCESS FULL, 20090930_in.T_OUTER
   INDEX UNIQUE SCAN, 20090930_in.UX_INNER_UVALUE
</pre>
</div>
<p>Exactly same query plan. From <strong>Oracle</strong>&#8216;s point of view, these queries are identical. <strong>Oracle</strong> just sees that they are the same and can avoid the whole query evaluation at all: it just uses index access.</p>
<p>Now let&#8217;s see how smart <strong>Oracle</strong> will be when it will have to search for a non-indexed column which, in addition, contains duplicates.</p>
<h3>JOIN / DISTINCT on a non-indexed column</h3>
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer o
JOIN    (
        SELECT  DISTINCT pvalue
        FROM    t_inner
        )
ON      o.value = pvalue
</pre>
<p><a href="#" onclick="xcollapse('X71');return false;">View query details</a><br />
</p>
<div id="X71" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (2.2187s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 VIEW
  HASH UNIQUE
   HASH JOIN
    TABLE ACCESS FULL, 20090930_in.T_OUTER
    TABLE ACCESS FULL, 20090930_in.T_INNER
</pre>
</div>
<p>Now the query takes quite a significant time, more than <strong>2 seconds</strong>.</p>
<p><strong>Oracle</strong> decided to make the join first and then get rid of the duplicates, despite the fact that the <code>DISTINCT</code> clause is inside the inline view, not outside. However, semantically this is the same and <strong>Oracle</strong> knows it.</p>
<p>Now, how will the <code>IN</code> predicate work? Will it be so smart to convert the query to the <code>JOIN</code> which completes in <strong>2 seconds</strong> or it will be running the inner query over and over and over again?</p>
<p>Let&#8217;s see.</p>
<h3>IN on a non-indexed column</h3>
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer
WHERE   value IN
        (
        SELECT  pvalue
        FROM    t_inner
        )
</pre>
<p><a href="#" onclick="xcollapse('X6477');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X6477" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0016s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 HASH JOIN SEMI
  TABLE ACCESS FULL, 20090930_in.T_OUTER
  TABLE ACCESS FULL, 20090930_in.T_INNER
</pre>
</div>
<p><strong>1 millisecond</strong>.</p>
<p>The plan says <code>TABLE ACCESS FULL</code> on both tables, and the <code>t_inner</code> is very large, there are <strong>1,000,000</strong> rows there and they are long rows.</p>
<p>It just cannot be scanned in <strong>1 ms</strong>. However, this is what we see. How can it be?</p>
<p>If we look closer to the plan we will see a special method there, a <code>HASH JOIN SEMI</code>. This is a such called semi-join. It is designed just for the queries like this, using <code>IN</code> or <code>EXISTS</code> with an equality condition on the correlated field.</p>
<p>This method works as follows:</p>
<ul>
<li>It builds a hash table on the outer query (the one that is probed). I repeat: it is the outer query that is hashed, not the inner one. Everything that needs to be returned in <code>SELECT</code> clause goes to the hash table
</li>
<li>Then it takes every row from the inner table and searches the hash table for the value of the column being tested</li>
<li>If the value is found, the engine returns all records from the hash table that match this value and <em>deletes</em> them from the hash table. Unlike the plain <code>JOIN</code>, the hash table is dynamic: it is populated before the <code>JOIN</code> begins and each row is removed from it as its gets returned</li>
<li>When either the hash table or the subquery is out if records, the query completes</li>
</ul>
<p>Instead of removing the duplicates from the subquery before the iteration begins, the engine removes the values from the hash table as long as the iteration goes. This cannot be easily rewritten in <strong>SQL</strong> since <strong>SQL</strong> operates on immutable sets. However, outside of the set theory this can be implemented easily.</p>
<p>The duplicates in the subquery are not a problem anymore. Imagine we have value <strong>42</strong> three times in the outer table and five times in the inner table. The hash table will be build and all three outer query records will get into this hash table.</p>
<p>Now, the first value of <strong>42</strong> is fetched out of the subquery. This will result in a hash hit, all three records sharing this value will be found, returned and, which is most important, deleted.</p>
<p>What happens when the second value of <strong>42</strong> will be fetched? Well, nothing will happen. There are no more records in the hash table, they were deleted as they were returned. That&#8217;s why this duplicate will just result in a hash miss and go away. Same is true for all other duplicates.</p>
<p>Now we can see what happened in our query. The engine build the hash table over the <strong>10</strong> values from <strong>t_outer</strong>. Then it took the first value from <code>t_inner</code> (which happened to be a <strong>1</strong>), found it in the hash table and returned the hashed record from <code>t_outer</code>. The second value of <strong>1</strong> fetched from the subquery resulted in a hash miss and was promptly forgotten. In <strong>20</strong> records the hash table was empty. And when the hash table is empty, there is no point in running the query further: nothing could possibly be returned anymore.</p>
<p>Of course, there was a little cheating here: first <strong>10</strong> numbers just happened to be fetched out in first <strong>10</strong> records. If we would search for values from <strong>499,990</strong> to <strong>500,000</strong>, the query would run for the same <strong>2 seconds</strong>. However, in average, the hash table would be emptied much faster than the <code>DISTINCT</code> set could be built to use with a <code>JOIN</code>.</p>
<h3>EXISTS</h3>
<p><strong>Oracle</strong> treats <code>EXISTS</code> predicate with an equality condition for a correlated value exactly the same as an <code>IN</code> predicate. Let&#8217;s see it:</p>
<p><a href="#" onclick="xcollapse('X8508');return false;"><strong><code>EXISTS</code> on unique value</strong></a><br />
</p>
<div id="X8508" style="display: none; ">
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    t_inner
        WHERE   uvalue = value
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0089s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS SEMI
  TABLE ACCESS FULL, 20090930_in.T_OUTER
  INDEX UNIQUE SCAN, 20090930_in.UX_INNER_UVALUE
</pre>
</div>
<p><a href="#" onclick="xcollapse('X8509');return false;"><strong><code>EXISTS</code> on a non-unique value</strong></a><br />
</p>
<div id="X8509" style="display: none; ">
<pre class="brush: sql">
SELECT  id, value
FROM    t_outer
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    t_inner
        WHERE   pvalue = value
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="double_precision">1</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">6</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">7</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">8</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">9</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="double_precision">10</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.0016s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 HASH JOIN SEMI
  TABLE ACCESS FULL, 20090930_in.T_OUTER
  TABLE ACCESS FULL, 20090930_in.T_INNER
</pre>
</div>
<p>Exactly same plans, results and performance.</p>
<h3>Summary</h3>
<p><strong>IN</strong> predicate is an ugly duckling of <strong>SQL</strong>. Due to the fact it appears inefficient in the eye of an inexperienced developer, it is generally considered non-efficient and <code>EXISTS</code> or a <code>JOIN</code> are advised to use instead.</p>
<p>However, <strong>Oracle</strong> optimizes <code>IN</code> fairly well, using indexes, <code>UNIQUE</code> constraints, <code>NOT NULL</code> modifiers and other extra information wisely.</p>
<p>Even in case of duplicates and absence of an index on the subquery field, <strong>Oracle</strong> is able to use <code>HASH SEMI JOIN</code> method which is more efficient than a <code>JOIN</code> / <code>DISTINCT</code> solution.</p>
<p><code>EXISTS</code> is optimized exactly the same, however, <code>IN</code> is more readable and concise.</p>
<p><strong>IN</strong> predicate is exactly what should be use to search records against a list of values in <strong>Oracle</strong>.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adjacency list vs. nested sets: Oracle</title>
		<link>http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/</link>
		<comments>http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 19:00:03 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3256</guid>
		<description><![CDATA[Continuing the series: What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model? For detailed explanations of the terms, see the first article in the series: Adjacency list vs. nested sets: PostgreSQL Today is Oracle time. Adjacent sets require recursion, and Oracle has a native way to implement recursion in [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing the series:</p>
<blockquote><p>What is better to store hierarchical data: <strong>nested sets</strong> model or <strong>adjacency list</strong> (parent-child) model?
</p></blockquote>
<p>For detailed explanations of the terms, see the first article in the series:</p>
<ul>
<li><a href="/2009/09/24/adjacency-list-vs-nested-sets-postgresql/"><strong>Adjacency list vs. nested sets: PostgreSQL</strong></a></li>
</ul>
<p>Today is <strong>Oracle</strong> time.</p>
<p>Adjacent sets require recursion, and <strong>Oracle</strong> has a native way to implement recursion in <strong>SQL</strong> statements. To do this it uses a special clause, <code>CONNECT BY</code>.</p>
<p>Unlike other systems, <strong>Oracle</strong> always supported recursion in <strong>SQL</strong> statements since <strong>version 2</strong> (this is the name the first version of <strong>Oracle</strong> was released under).</p>
<p>This was done because by that time there already were several <a href="/2009/08/23/what-is-a-relational-database/">hierarchical</a> databases on the market which are quite good for storing hierarchical data. If not for this clause, transition from a hierarchical <strong>DBMS</strong> to the relational <strong>DBMS</strong> would be too hard.</p>
<p><strong>Oracle</strong>&#8216;s way to query self-relations is somewhat different from recursive <strong>CTE</strong>&#8216;s that other systems use:</p>
<ul>
<li>
<p>Recursive <strong>CTE</strong>s can use an arbitrary set in recursive steps. The recursion base and recursion stack are not visible in further steps. The recursive operation is a set operation (usually a <code>JOIN</code>) on the recursion parameter (a set) and the result of the previous operation:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/09/recursive.png" alt="Recursive CTE" title="Recursive CTE" class="aligncenter size-full wp-image-3259 noborder" /></p>
</li>
<li>
<p><code>CONNECT BY</code> queries can only use the value of a single row in recursive steps.</p>
<p>The recursion base and recursion stack are visible, the values outside the recursion stack are not.</p>
<p>The recursive operation is a set operation on each row of the current row and the base rowset.</p>
<p>Unlike recursive <strong>CTE</strong>, <code>CONNECT BY</code> operates row-wise, that is each row of the result spans its own recursion branch with its own stack:</p>
<p><img src="http://explainextended.com/wp-content/uploads/2009/09/connect.png" alt="CONNECT BY" title="CONNECT BY" class="aligncenter size-full wp-image-3260 noborder" /></p>
</li>
</ul>
<p>The main difference it that the <code>CONNECT BY</code> operation cannot produce anything that was not in the original rowset, while a recursive <strong>CTE</strong> can.</p>
<p>However, <code>CONNECT BY</code> has numerous benefits too: it&#8217;s very fast, allows to track the hierarchy path easily (using a built-in funtion named <code>SYS_CONNECT_BY_PATH</code>) and implicitly sorts the recordset in tree order (with additional clause <code>ORDER SIBLINGS BY</code> to sort the siblings).</p>
<p>This method, of course, allows traversing adjacency lists easily.</p>
<p>To compare efficiency of adjacency lists and that of nested sets, let&#8217;s create a sample table:<br />
<span id="more-3256"></span><br />
<a href="#" onclick="xcollapse('X9189');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X9189" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_hierarchy (id NOT NULL, parent NOT NULL, lft NOT NULL, rgt NOT NULL, data NOT NULL, stuffing NOT NULL)
AS
WITH    nums AS
        (
        SELECT  level AS num, 100 AS value
        FROM    dual
        CONNECT BY
                level &lt;= 5
        ),
        digits AS
        (
        SELECT  SYS_CONNECT_BY_PATH(num, &#039;/&#039;) AS digit, level AS len
        FROM    nums
        CONNECT BY
                level &lt;= 9
        )
SELECT  id,
        FLOOR((id - 1) / 5),
        lft,
        lft + FLOOR(2929687 / POWER(5, len)) - 1,
        CAST(&#039;Value &#039; || id AS VARCHAR2(100)),
        CAST(RPAD(&#039;*&#039;, 100, &#039;*&#039;) AS VARCHAR2(100))
FROM    (
        SELECT  digit, len,
                (
                SELECT  SUM(SUBSTR(digit, level * 2, 1) * POWER(5, len - level))
                FROM    dual
                CONNECT BY
                        level &lt;= len
                ) AS id,
                (
                SELECT  SUM((SUBSTR(digit, level * 2, 1) - 1) * FLOOR(2929687 / POWER(5, level)) + 1)
                FROM    dual
                CONNECT BY
                        level &lt;= len
                ) AS lft
        FROM    digits
        )
/
ALTER TABLE t_hierarchy ADD CONSTRAINT pk_hierarchy_id PRIMARY KEY (id)
/
CREATE INDEX ix_hierarchy_parent ON t_hierarchy (parent)
/
CREATE INDEX ix_hierarchy_lft ON t_hierarchy (lft)
/
CREATE INDEX ix_hierarchy_rgt ON t_hierarchy (rgt)
/
</pre>
</div>
<p>The table contains both adjacency list data and nested sets data, with <strong>8</strong> levels of nesting, <strong>5</strong> children of each parent node and <strong>2,441,405</strong> records.</p>
<p>As in the previous articles, we&#8217;ll test performance of the three most used queries:</p>
<ul>
<li>Find all descendants of a given node</li>
<li>Find all ancestors of a given node</li>
<li>Find all descendants of a given node up to a certain depth</li>
</ul>
<h3>All descendants</h3>
<h4>Nested sets</h4>
<pre class="brush: sql">
SELECT  SUM(LENGTH(hc.stuffing))
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 42
</pre>
<p><a href="#" onclick="xcollapse('X10763');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X10763" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(LENGTH(HC.STUFFING))</th>
</tr>
<tr>
<td class="double_precision">1953100</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.0144s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  NESTED LOOPS
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
</pre>
</div>
<p>This is super fast, faster than in any system before that: only <strong>10 ms</strong>. The plan is quite predictable: a <code>UNIQUE</code> scan to find the parent record and a range scan to find all children.</p>
<h4>Adjacency list</h4>
<pre class="brush: sql">
SELECT  SUM(LENGTH(stuffing))
FROM    t_hierarchy
START WITH
        id = 42
CONNECT BY
        parent = PRIOR id
</pre>
<p><a href="#" onclick="xcollapse('X8132');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X8132" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>SUM(LENGTH(STUFFING))</th>
</tr>
<tr>
<td class="double_precision">1953100</td>
</tr>
<tr class="statusbar">
<td colspan="100">1 row fetched in 0.0001s (0.1479s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT AGGREGATE
  CONNECT BY WITH FILTERING
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   NESTED LOOPS
    BUFFER SORT
     CONNECT BY PUMP
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
   TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY
</pre>
</div>
<p>This is quite fast too: <strong>140 ms</strong>. The plan loops over the <code>parent</code> ranges for each <code>id</code>.</p>
<p>Note the <code>TABLE ACCESS FULL</code> in the end of the plan. <strong>Oracle</strong> reserves a right to do a <code>FULL SCAN</code> instead of a <code>RANGE SCAN</code> if it considers a certain <code>parent</code> range too large to use the index. However, this is recursion, not a single operation, and it&#8217;s impossible to tell in advance which <code>id</code> values will be returned on each recursion step. That&#8217;s why <strong>Oracle</strong> shows both methods, just in case, and chooses the most efficient method in runtime.</p>
<p>In our table, at most <strong>5</strong> rows share a <code>parent</code> (of more then <strong>2 million</strong>), so it&#8217;s very unlikely this plan branch will be ever chosen.</p>
<h3>Find all ancestors</h3>
<h4>Nested sets</h4>
<pre class="brush: sql">
SELECT  hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM    t_hierarchy hc
JOIN    t_hierarchy hp
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hc.id = 1000000
ORDER BY
        hp.lft
</pre>
<p><a href="#" onclick="xcollapse('X10562');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X10562" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">0</td>
<td class="double_precision">585938</td>
<td class="double_precision">1171874</td>
<td class="varchar2">Value 2</td>
</tr>
<tr>
<td class="double_precision">12</td>
<td class="double_precision">2</td>
<td class="double_precision">703126</td>
<td class="double_precision">820312</td>
<td class="varchar2">Value 12</td>
</tr>
<tr>
<td class="double_precision">63</td>
<td class="double_precision">12</td>
<td class="double_precision">750001</td>
<td class="double_precision">773437</td>
<td class="varchar2">Value 63</td>
</tr>
<tr>
<td class="double_precision">319</td>
<td class="double_precision">63</td>
<td class="double_precision">764063</td>
<td class="double_precision">768749</td>
<td class="varchar2">Value 319</td>
</tr>
<tr>
<td class="double_precision">1599</td>
<td class="double_precision">319</td>
<td class="double_precision">766875</td>
<td class="double_precision">767811</td>
<td class="varchar2">Value 1599</td>
</tr>
<tr>
<td class="double_precision">7999</td>
<td class="double_precision">1599</td>
<td class="double_precision">767437</td>
<td class="double_precision">767623</td>
<td class="varchar2">Value 7999</td>
</tr>
<tr>
<td class="double_precision">39999</td>
<td class="double_precision">7999</td>
<td class="double_precision">767549</td>
<td class="double_precision">767585</td>
<td class="varchar2">Value 39999</td>
</tr>
<tr>
<td class="double_precision">199999</td>
<td class="double_precision">39999</td>
<td class="double_precision">767571</td>
<td class="double_precision">767577</td>
<td class="varchar2">Value 199999</td>
</tr>
<tr>
<td class="double_precision">1000000</td>
<td class="double_precision">199999</td>
<td class="double_precision">767576</td>
<td class="double_precision">767576</td>
<td class="varchar2">Value 1000000</td>
</tr>
<tr class="statusbar">
<td colspan="100">9 rows fetched in 0.0005s (3.2186s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
</pre>
</div>
<p>The plan used is exactly the same as for the query to choose all descendants.</p>
<p>However, the range on <code>lft</code> is too broad and the query works for more than <strong>3 seconds</strong>.</p>
<h4>Adjacency list</h4>
<pre class="brush: sql">
SELECT  id, parent, lft, rgt, data, level
FROM    t_hierarchy
START WITH
        id = 1000000
CONNECT BY
        id = PRIOR parent
ORDER BY
        level DESC
</pre>
<p><a href="#" onclick="xcollapse('X3569');return false;"><strong>View query details</strong></a><br />
</p>
<div id="X3569" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
<th>LEVEL</th>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="double_precision">0</td>
<td class="double_precision">585938</td>
<td class="double_precision">1171874</td>
<td class="varchar2">Value 2</td>
<td class="double_precision">9</td>
</tr>
<tr>
<td class="double_precision">12</td>
<td class="double_precision">2</td>
<td class="double_precision">703126</td>
<td class="double_precision">820312</td>
<td class="varchar2">Value 12</td>
<td class="double_precision">8</td>
</tr>
<tr>
<td class="double_precision">63</td>
<td class="double_precision">12</td>
<td class="double_precision">750001</td>
<td class="double_precision">773437</td>
<td class="varchar2">Value 63</td>
<td class="double_precision">7</td>
</tr>
<tr>
<td class="double_precision">319</td>
<td class="double_precision">63</td>
<td class="double_precision">764063</td>
<td class="double_precision">768749</td>
<td class="varchar2">Value 319</td>
<td class="double_precision">6</td>
</tr>
<tr>
<td class="double_precision">1599</td>
<td class="double_precision">319</td>
<td class="double_precision">766875</td>
<td class="double_precision">767811</td>
<td class="varchar2">Value 1599</td>
<td class="double_precision">5</td>
</tr>
<tr>
<td class="double_precision">7999</td>
<td class="double_precision">1599</td>
<td class="double_precision">767437</td>
<td class="double_precision">767623</td>
<td class="varchar2">Value 7999</td>
<td class="double_precision">4</td>
</tr>
<tr>
<td class="double_precision">39999</td>
<td class="double_precision">7999</td>
<td class="double_precision">767549</td>
<td class="double_precision">767585</td>
<td class="varchar2">Value 39999</td>
<td class="double_precision">3</td>
</tr>
<tr>
<td class="double_precision">199999</td>
<td class="double_precision">39999</td>
<td class="double_precision">767571</td>
<td class="double_precision">767577</td>
<td class="varchar2">Value 199999</td>
<td class="double_precision">2</td>
</tr>
<tr>
<td class="double_precision">1000000</td>
<td class="double_precision">199999</td>
<td class="double_precision">767576</td>
<td class="double_precision">767576</td>
<td class="varchar2">Value 1000000</td>
<td class="double_precision">1</td>
</tr>
<tr class="statusbar">
<td colspan="100">9 rows fetched in 0.0005s (0.0009s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 SORT ORDER BY
  CONNECT BY WITH FILTERING
   TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
    INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   NESTED LOOPS
    BUFFER SORT
     CONNECT BY PUMP
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
   TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY
</pre>
</div>
<p>This query completes in less than <strong>1 ms</strong>, which is instant.</p>
<p>Note that unlike recursive <strong>CTE</strong>&#8216;s, <strong>Oracle</strong> supplies a built-in preudocolumn, <code>level</code>, that returns the current recursion level. This comes handy in cases like this.</p>
<h3>Descendants up to a given level</h3>
<h4>Nested sets</h4>
<pre class="brush: sql">
SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = ?
        AND
        (
        SELECT  COUNT(*)
        FROM    t_hierarchy hn
        WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
                AND hn.lft BETWEEN hp.lft AND hp.rgt
        ) &lt;= 3
</pre>
<p><a href="#" onclick="xcollapse('X2978');return false;"><strong>View the query for node 42</strong></a><br />
</p>
<div id="X2978" style="display: none; ">
<pre class="brush: sql">
SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 42
        AND
        (
        SELECT  COUNT(*)
        FROM    t_hierarchy hn
        WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
                AND hn.lft BETWEEN hp.lft AND hp.rgt
        ) &lt;= 3
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
</tr>
<tr>
<td class="double_precision">42</td>
<td class="double_precision">8</td>
<td class="double_precision">257814</td>
<td class="double_precision">281250</td>
<td class="varchar2">Value 42</td>
</tr>
<tr>
<td class="double_precision">211</td>
<td class="double_precision">42</td>
<td class="double_precision">257815</td>
<td class="double_precision">262501</td>
<td class="varchar2">Value 211</td>
</tr>
<tr>
<td class="double_precision">1056</td>
<td class="double_precision">211</td>
<td class="double_precision">257816</td>
<td class="double_precision">258752</td>
<td class="varchar2">Value 1056</td>
</tr>
<tr>
<td class="double_precision">1057</td>
<td class="double_precision">211</td>
<td class="double_precision">258753</td>
<td class="double_precision">259689</td>
<td class="varchar2">Value 1057</td>
</tr>
<tr>
<td class="double_precision">1058</td>
<td class="double_precision">211</td>
<td class="double_precision">259690</td>
<td class="double_precision">260626</td>
<td class="varchar2">Value 1058</td>
</tr>
<tr>
<td class="double_precision">1059</td>
<td class="double_precision">211</td>
<td class="double_precision">260627</td>
<td class="double_precision">261563</td>
<td class="varchar2">Value 1059</td>
</tr>
<tr>
<td class="double_precision">1060</td>
<td class="double_precision">211</td>
<td class="double_precision">261564</td>
<td class="double_precision">262500</td>
<td class="varchar2">Value 1060</td>
</tr>
<tr>
<td class="double_precision">212</td>
<td class="double_precision">42</td>
<td class="double_precision">262502</td>
<td class="double_precision">267188</td>
<td class="varchar2">Value 212</td>
</tr>
<tr>
<td class="double_precision">1061</td>
<td class="double_precision">212</td>
<td class="double_precision">262503</td>
<td class="double_precision">263439</td>
<td class="varchar2">Value 1061</td>
</tr>
<tr>
<td class="double_precision">1062</td>
<td class="double_precision">212</td>
<td class="double_precision">263440</td>
<td class="double_precision">264376</td>
<td class="varchar2">Value 1062</td>
</tr>
<tr>
<td class="double_precision">1063</td>
<td class="double_precision">212</td>
<td class="double_precision">264377</td>
<td class="double_precision">265313</td>
<td class="varchar2">Value 1063</td>
</tr>
<tr>
<td class="double_precision">1064</td>
<td class="double_precision">212</td>
<td class="double_precision">265314</td>
<td class="double_precision">266250</td>
<td class="varchar2">Value 1064</td>
</tr>
<tr>
<td class="double_precision">1065</td>
<td class="double_precision">212</td>
<td class="double_precision">266251</td>
<td class="double_precision">267187</td>
<td class="varchar2">Value 1065</td>
</tr>
<tr>
<td class="double_precision">213</td>
<td class="double_precision">42</td>
<td class="double_precision">267189</td>
<td class="double_precision">271875</td>
<td class="varchar2">Value 213</td>
</tr>
<tr>
<td class="double_precision">1066</td>
<td class="double_precision">213</td>
<td class="double_precision">267190</td>
<td class="double_precision">268126</td>
<td class="varchar2">Value 1066</td>
</tr>
<tr>
<td class="double_precision">1067</td>
<td class="double_precision">213</td>
<td class="double_precision">268127</td>
<td class="double_precision">269063</td>
<td class="varchar2">Value 1067</td>
</tr>
<tr>
<td class="double_precision">1068</td>
<td class="double_precision">213</td>
<td class="double_precision">269064</td>
<td class="double_precision">270000</td>
<td class="varchar2">Value 1068</td>
</tr>
<tr>
<td class="double_precision">1069</td>
<td class="double_precision">213</td>
<td class="double_precision">270001</td>
<td class="double_precision">270937</td>
<td class="varchar2">Value 1069</td>
</tr>
<tr>
<td class="double_precision">1070</td>
<td class="double_precision">213</td>
<td class="double_precision">270938</td>
<td class="double_precision">271874</td>
<td class="varchar2">Value 1070</td>
</tr>
<tr>
<td class="double_precision">214</td>
<td class="double_precision">42</td>
<td class="double_precision">271876</td>
<td class="double_precision">276562</td>
<td class="varchar2">Value 214</td>
</tr>
<tr>
<td class="double_precision">1071</td>
<td class="double_precision">214</td>
<td class="double_precision">271877</td>
<td class="double_precision">272813</td>
<td class="varchar2">Value 1071</td>
</tr>
<tr>
<td class="double_precision">1072</td>
<td class="double_precision">214</td>
<td class="double_precision">272814</td>
<td class="double_precision">273750</td>
<td class="varchar2">Value 1072</td>
</tr>
<tr>
<td class="double_precision">1073</td>
<td class="double_precision">214</td>
<td class="double_precision">273751</td>
<td class="double_precision">274687</td>
<td class="varchar2">Value 1073</td>
</tr>
<tr>
<td class="double_precision">1074</td>
<td class="double_precision">214</td>
<td class="double_precision">274688</td>
<td class="double_precision">275624</td>
<td class="varchar2">Value 1074</td>
</tr>
<tr>
<td class="double_precision">1075</td>
<td class="double_precision">214</td>
<td class="double_precision">275625</td>
<td class="double_precision">276561</td>
<td class="varchar2">Value 1075</td>
</tr>
<tr>
<td class="double_precision">215</td>
<td class="double_precision">42</td>
<td class="double_precision">276563</td>
<td class="double_precision">281249</td>
<td class="varchar2">Value 215</td>
</tr>
<tr>
<td class="double_precision">1076</td>
<td class="double_precision">215</td>
<td class="double_precision">276564</td>
<td class="double_precision">277500</td>
<td class="varchar2">Value 1076</td>
</tr>
<tr>
<td class="double_precision">1077</td>
<td class="double_precision">215</td>
<td class="double_precision">277501</td>
<td class="double_precision">278437</td>
<td class="varchar2">Value 1077</td>
</tr>
<tr>
<td class="double_precision">1078</td>
<td class="double_precision">215</td>
<td class="double_precision">278438</td>
<td class="double_precision">279374</td>
<td class="varchar2">Value 1078</td>
</tr>
<tr>
<td class="double_precision">1079</td>
<td class="double_precision">215</td>
<td class="double_precision">279375</td>
<td class="double_precision">280311</td>
<td class="varchar2">Value 1079</td>
</tr>
<tr>
<td class="double_precision">1080</td>
<td class="double_precision">215</td>
<td class="double_precision">280312</td>
<td class="double_precision">281248</td>
<td class="varchar2">Value 1080</td>
</tr>
<tr class="statusbar">
<td colspan="100">31 rows fetched in 0.0000s (112.5589s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
    SORT AGGREGATE
     FILTER
      TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
       INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
</pre>
</div>
<p><a href="#" onclick="xcollapse('X2570');return false;"><strong>View the query for node 31,415</strong></a><br />
</p>
<div id="X2570" style="display: none; ">
<pre class="brush: sql">
SELECT  hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM    t_hierarchy hp
JOIN    t_hierarchy hc
ON      hc.lft BETWEEN hp.lft AND hp.rgt
WHERE   hp.id = 31415
        AND
        (
        SELECT  COUNT(*)
        FROM    t_hierarchy hn
        WHERE   hc.lft BETWEEN hn.lft AND hn.rgt
                AND hn.lft BETWEEN hp.lft AND hp.rgt
        ) &lt;= 3
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
</tr>
<tr>
<td class="double_precision">31415</td>
<td class="double_precision">6282</td>
<td class="double_precision">445651</td>
<td class="double_precision">445687</td>
<td class="varchar2">Value 31415</td>
</tr>
<tr>
<td class="double_precision">157076</td>
<td class="double_precision">31415</td>
<td class="double_precision">445652</td>
<td class="double_precision">445658</td>
<td class="varchar2">Value 157076</td>
</tr>
<tr>
<td class="double_precision">785381</td>
<td class="double_precision">157076</td>
<td class="double_precision">445653</td>
<td class="double_precision">445653</td>
<td class="varchar2">Value 785381</td>
</tr>
<tr>
<td class="double_precision">785382</td>
<td class="double_precision">157076</td>
<td class="double_precision">445654</td>
<td class="double_precision">445654</td>
<td class="varchar2">Value 785382</td>
</tr>
<tr>
<td class="double_precision">785383</td>
<td class="double_precision">157076</td>
<td class="double_precision">445655</td>
<td class="double_precision">445655</td>
<td class="varchar2">Value 785383</td>
</tr>
<tr>
<td class="double_precision">785384</td>
<td class="double_precision">157076</td>
<td class="double_precision">445656</td>
<td class="double_precision">445656</td>
<td class="varchar2">Value 785384</td>
</tr>
<tr>
<td class="double_precision">785385</td>
<td class="double_precision">157076</td>
<td class="double_precision">445657</td>
<td class="double_precision">445657</td>
<td class="varchar2">Value 785385</td>
</tr>
<tr>
<td class="double_precision">157077</td>
<td class="double_precision">31415</td>
<td class="double_precision">445659</td>
<td class="double_precision">445665</td>
<td class="varchar2">Value 157077</td>
</tr>
<tr>
<td class="double_precision">785386</td>
<td class="double_precision">157077</td>
<td class="double_precision">445660</td>
<td class="double_precision">445660</td>
<td class="varchar2">Value 785386</td>
</tr>
<tr>
<td class="double_precision">785387</td>
<td class="double_precision">157077</td>
<td class="double_precision">445661</td>
<td class="double_precision">445661</td>
<td class="varchar2">Value 785387</td>
</tr>
<tr>
<td class="double_precision">785388</td>
<td class="double_precision">157077</td>
<td class="double_precision">445662</td>
<td class="double_precision">445662</td>
<td class="varchar2">Value 785388</td>
</tr>
<tr>
<td class="double_precision">785389</td>
<td class="double_precision">157077</td>
<td class="double_precision">445663</td>
<td class="double_precision">445663</td>
<td class="varchar2">Value 785389</td>
</tr>
<tr>
<td class="double_precision">785390</td>
<td class="double_precision">157077</td>
<td class="double_precision">445664</td>
<td class="double_precision">445664</td>
<td class="varchar2">Value 785390</td>
</tr>
<tr>
<td class="double_precision">157078</td>
<td class="double_precision">31415</td>
<td class="double_precision">445666</td>
<td class="double_precision">445672</td>
<td class="varchar2">Value 157078</td>
</tr>
<tr>
<td class="double_precision">785391</td>
<td class="double_precision">157078</td>
<td class="double_precision">445667</td>
<td class="double_precision">445667</td>
<td class="varchar2">Value 785391</td>
</tr>
<tr>
<td class="double_precision">785392</td>
<td class="double_precision">157078</td>
<td class="double_precision">445668</td>
<td class="double_precision">445668</td>
<td class="varchar2">Value 785392</td>
</tr>
<tr>
<td class="double_precision">785393</td>
<td class="double_precision">157078</td>
<td class="double_precision">445669</td>
<td class="double_precision">445669</td>
<td class="varchar2">Value 785393</td>
</tr>
<tr>
<td class="double_precision">785394</td>
<td class="double_precision">157078</td>
<td class="double_precision">445670</td>
<td class="double_precision">445670</td>
<td class="varchar2">Value 785394</td>
</tr>
<tr>
<td class="double_precision">785395</td>
<td class="double_precision">157078</td>
<td class="double_precision">445671</td>
<td class="double_precision">445671</td>
<td class="varchar2">Value 785395</td>
</tr>
<tr>
<td class="double_precision">157079</td>
<td class="double_precision">31415</td>
<td class="double_precision">445673</td>
<td class="double_precision">445679</td>
<td class="varchar2">Value 157079</td>
</tr>
<tr>
<td class="double_precision">785396</td>
<td class="double_precision">157079</td>
<td class="double_precision">445674</td>
<td class="double_precision">445674</td>
<td class="varchar2">Value 785396</td>
</tr>
<tr>
<td class="double_precision">785397</td>
<td class="double_precision">157079</td>
<td class="double_precision">445675</td>
<td class="double_precision">445675</td>
<td class="varchar2">Value 785397</td>
</tr>
<tr>
<td class="double_precision">785398</td>
<td class="double_precision">157079</td>
<td class="double_precision">445676</td>
<td class="double_precision">445676</td>
<td class="varchar2">Value 785398</td>
</tr>
<tr>
<td class="double_precision">785399</td>
<td class="double_precision">157079</td>
<td class="double_precision">445677</td>
<td class="double_precision">445677</td>
<td class="varchar2">Value 785399</td>
</tr>
<tr>
<td class="double_precision">785400</td>
<td class="double_precision">157079</td>
<td class="double_precision">445678</td>
<td class="double_precision">445678</td>
<td class="varchar2">Value 785400</td>
</tr>
<tr>
<td class="double_precision">157080</td>
<td class="double_precision">31415</td>
<td class="double_precision">445680</td>
<td class="double_precision">445686</td>
<td class="varchar2">Value 157080</td>
</tr>
<tr>
<td class="double_precision">785401</td>
<td class="double_precision">157080</td>
<td class="double_precision">445681</td>
<td class="double_precision">445681</td>
<td class="varchar2">Value 785401</td>
</tr>
<tr>
<td class="double_precision">785402</td>
<td class="double_precision">157080</td>
<td class="double_precision">445682</td>
<td class="double_precision">445682</td>
<td class="varchar2">Value 785402</td>
</tr>
<tr>
<td class="double_precision">785403</td>
<td class="double_precision">157080</td>
<td class="double_precision">445683</td>
<td class="double_precision">445683</td>
<td class="varchar2">Value 785403</td>
</tr>
<tr>
<td class="double_precision">785404</td>
<td class="double_precision">157080</td>
<td class="double_precision">445684</td>
<td class="double_precision">445684</td>
<td class="varchar2">Value 785404</td>
</tr>
<tr>
<td class="double_precision">785405</td>
<td class="double_precision">157080</td>
<td class="double_precision">445685</td>
<td class="double_precision">445685</td>
<td class="varchar2">Value 785405</td>
</tr>
<tr class="statusbar">
<td colspan="100">31 rows fetched in 0.0016s (0.0015s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 NESTED LOOPS
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
    SORT AGGREGATE
     FILTER
      TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
       INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_LFT
</pre>
</div>
<p>As in other systems, performance heavily depends on how many descendants does the record have.</p>
<p>For the node <strong>42</strong> which is close to the root node and has lots of descendants, the query runs for <strong>122 seconds</strong>. For the node <strong>31,415</strong>, the query takes <strong>1.5 ms</strong> (next to instant).</p>
<h4>Adjacency list</h4>
<pre class="brush: sql">
SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
        id = ?
CONNECT BY
        parent = PRIOR id
        AND level &lt;= 3
</pre>
<p>Limiting the recursion depth query is very simple in <strong>Oracle</strong>: we just add the additional condition to <strong>CONNECT BY</strong> clause constraining the <code>limit</code>.</p>
<p><a href="#" onclick="xcollapse('X10909');return false;"><strong>View the query for node 42</strong></a><br />
</p>
<div id="X10909" style="display: none; ">
<pre class="brush: sql">
SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
        id = 42
CONNECT BY
        parent = PRIOR id
        AND level &lt;= 3
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
</tr>
<tr>
<td class="double_precision">42</td>
<td class="double_precision">8</td>
<td class="double_precision">257814</td>
<td class="double_precision">281250</td>
<td class="varchar2">Value 42</td>
</tr>
<tr>
<td class="double_precision">211</td>
<td class="double_precision">42</td>
<td class="double_precision">257815</td>
<td class="double_precision">262501</td>
<td class="varchar2">Value 211</td>
</tr>
<tr>
<td class="double_precision">1056</td>
<td class="double_precision">211</td>
<td class="double_precision">257816</td>
<td class="double_precision">258752</td>
<td class="varchar2">Value 1056</td>
</tr>
<tr>
<td class="double_precision">1057</td>
<td class="double_precision">211</td>
<td class="double_precision">258753</td>
<td class="double_precision">259689</td>
<td class="varchar2">Value 1057</td>
</tr>
<tr>
<td class="double_precision">1058</td>
<td class="double_precision">211</td>
<td class="double_precision">259690</td>
<td class="double_precision">260626</td>
<td class="varchar2">Value 1058</td>
</tr>
<tr>
<td class="double_precision">1059</td>
<td class="double_precision">211</td>
<td class="double_precision">260627</td>
<td class="double_precision">261563</td>
<td class="varchar2">Value 1059</td>
</tr>
<tr>
<td class="double_precision">1060</td>
<td class="double_precision">211</td>
<td class="double_precision">261564</td>
<td class="double_precision">262500</td>
<td class="varchar2">Value 1060</td>
</tr>
<tr>
<td class="double_precision">212</td>
<td class="double_precision">42</td>
<td class="double_precision">262502</td>
<td class="double_precision">267188</td>
<td class="varchar2">Value 212</td>
</tr>
<tr>
<td class="double_precision">1061</td>
<td class="double_precision">212</td>
<td class="double_precision">262503</td>
<td class="double_precision">263439</td>
<td class="varchar2">Value 1061</td>
</tr>
<tr>
<td class="double_precision">1062</td>
<td class="double_precision">212</td>
<td class="double_precision">263440</td>
<td class="double_precision">264376</td>
<td class="varchar2">Value 1062</td>
</tr>
<tr>
<td class="double_precision">1063</td>
<td class="double_precision">212</td>
<td class="double_precision">264377</td>
<td class="double_precision">265313</td>
<td class="varchar2">Value 1063</td>
</tr>
<tr>
<td class="double_precision">1064</td>
<td class="double_precision">212</td>
<td class="double_precision">265314</td>
<td class="double_precision">266250</td>
<td class="varchar2">Value 1064</td>
</tr>
<tr>
<td class="double_precision">1065</td>
<td class="double_precision">212</td>
<td class="double_precision">266251</td>
<td class="double_precision">267187</td>
<td class="varchar2">Value 1065</td>
</tr>
<tr>
<td class="double_precision">213</td>
<td class="double_precision">42</td>
<td class="double_precision">267189</td>
<td class="double_precision">271875</td>
<td class="varchar2">Value 213</td>
</tr>
<tr>
<td class="double_precision">1066</td>
<td class="double_precision">213</td>
<td class="double_precision">267190</td>
<td class="double_precision">268126</td>
<td class="varchar2">Value 1066</td>
</tr>
<tr>
<td class="double_precision">1067</td>
<td class="double_precision">213</td>
<td class="double_precision">268127</td>
<td class="double_precision">269063</td>
<td class="varchar2">Value 1067</td>
</tr>
<tr>
<td class="double_precision">1068</td>
<td class="double_precision">213</td>
<td class="double_precision">269064</td>
<td class="double_precision">270000</td>
<td class="varchar2">Value 1068</td>
</tr>
<tr>
<td class="double_precision">1069</td>
<td class="double_precision">213</td>
<td class="double_precision">270001</td>
<td class="double_precision">270937</td>
<td class="varchar2">Value 1069</td>
</tr>
<tr>
<td class="double_precision">1070</td>
<td class="double_precision">213</td>
<td class="double_precision">270938</td>
<td class="double_precision">271874</td>
<td class="varchar2">Value 1070</td>
</tr>
<tr>
<td class="double_precision">214</td>
<td class="double_precision">42</td>
<td class="double_precision">271876</td>
<td class="double_precision">276562</td>
<td class="varchar2">Value 214</td>
</tr>
<tr>
<td class="double_precision">1071</td>
<td class="double_precision">214</td>
<td class="double_precision">271877</td>
<td class="double_precision">272813</td>
<td class="varchar2">Value 1071</td>
</tr>
<tr>
<td class="double_precision">1072</td>
<td class="double_precision">214</td>
<td class="double_precision">272814</td>
<td class="double_precision">273750</td>
<td class="varchar2">Value 1072</td>
</tr>
<tr>
<td class="double_precision">1073</td>
<td class="double_precision">214</td>
<td class="double_precision">273751</td>
<td class="double_precision">274687</td>
<td class="varchar2">Value 1073</td>
</tr>
<tr>
<td class="double_precision">1074</td>
<td class="double_precision">214</td>
<td class="double_precision">274688</td>
<td class="double_precision">275624</td>
<td class="varchar2">Value 1074</td>
</tr>
<tr>
<td class="double_precision">1075</td>
<td class="double_precision">214</td>
<td class="double_precision">275625</td>
<td class="double_precision">276561</td>
<td class="varchar2">Value 1075</td>
</tr>
<tr>
<td class="double_precision">215</td>
<td class="double_precision">42</td>
<td class="double_precision">276563</td>
<td class="double_precision">281249</td>
<td class="varchar2">Value 215</td>
</tr>
<tr>
<td class="double_precision">1076</td>
<td class="double_precision">215</td>
<td class="double_precision">276564</td>
<td class="double_precision">277500</td>
<td class="varchar2">Value 1076</td>
</tr>
<tr>
<td class="double_precision">1077</td>
<td class="double_precision">215</td>
<td class="double_precision">277501</td>
<td class="double_precision">278437</td>
<td class="varchar2">Value 1077</td>
</tr>
<tr>
<td class="double_precision">1078</td>
<td class="double_precision">215</td>
<td class="double_precision">278438</td>
<td class="double_precision">279374</td>
<td class="varchar2">Value 1078</td>
</tr>
<tr>
<td class="double_precision">1079</td>
<td class="double_precision">215</td>
<td class="double_precision">279375</td>
<td class="double_precision">280311</td>
<td class="varchar2">Value 1079</td>
</tr>
<tr>
<td class="double_precision">1080</td>
<td class="double_precision">215</td>
<td class="double_precision">280312</td>
<td class="double_precision">281248</td>
<td class="varchar2">Value 1080</td>
</tr>
<tr class="statusbar">
<td colspan="100">31 rows fetched in 0.0015s (0.0011s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 CONNECT BY WITH FILTERING
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  NESTED LOOPS
   BUFFER SORT
    CONNECT BY PUMP
   FILTER
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
  TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY
</pre>
</div>
<p><a href="#" onclick="xcollapse('X8434');return false;"><strong>View the query for node 31,415</strong></a><br />
</p>
<div id="X8434" style="display: none; ">
<pre class="brush: sql">
SELECT  id, parent, lft, rgt, data
FROM    t_hierarchy
START WITH
        id = 31415
CONNECT BY
        parent = PRIOR id
        AND level &lt;= 3
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>PARENT</th>
<th>LFT</th>
<th>RGT</th>
<th>DATA</th>
</tr>
<tr>
<td class="double_precision">31415</td>
<td class="double_precision">6282</td>
<td class="double_precision">445651</td>
<td class="double_precision">445687</td>
<td class="varchar2">Value 31415</td>
</tr>
<tr>
<td class="double_precision">157076</td>
<td class="double_precision">31415</td>
<td class="double_precision">445652</td>
<td class="double_precision">445658</td>
<td class="varchar2">Value 157076</td>
</tr>
<tr>
<td class="double_precision">785381</td>
<td class="double_precision">157076</td>
<td class="double_precision">445653</td>
<td class="double_precision">445653</td>
<td class="varchar2">Value 785381</td>
</tr>
<tr>
<td class="double_precision">785382</td>
<td class="double_precision">157076</td>
<td class="double_precision">445654</td>
<td class="double_precision">445654</td>
<td class="varchar2">Value 785382</td>
</tr>
<tr>
<td class="double_precision">785383</td>
<td class="double_precision">157076</td>
<td class="double_precision">445655</td>
<td class="double_precision">445655</td>
<td class="varchar2">Value 785383</td>
</tr>
<tr>
<td class="double_precision">785384</td>
<td class="double_precision">157076</td>
<td class="double_precision">445656</td>
<td class="double_precision">445656</td>
<td class="varchar2">Value 785384</td>
</tr>
<tr>
<td class="double_precision">785385</td>
<td class="double_precision">157076</td>
<td class="double_precision">445657</td>
<td class="double_precision">445657</td>
<td class="varchar2">Value 785385</td>
</tr>
<tr>
<td class="double_precision">157077</td>
<td class="double_precision">31415</td>
<td class="double_precision">445659</td>
<td class="double_precision">445665</td>
<td class="varchar2">Value 157077</td>
</tr>
<tr>
<td class="double_precision">785386</td>
<td class="double_precision">157077</td>
<td class="double_precision">445660</td>
<td class="double_precision">445660</td>
<td class="varchar2">Value 785386</td>
</tr>
<tr>
<td class="double_precision">785387</td>
<td class="double_precision">157077</td>
<td class="double_precision">445661</td>
<td class="double_precision">445661</td>
<td class="varchar2">Value 785387</td>
</tr>
<tr>
<td class="double_precision">785388</td>
<td class="double_precision">157077</td>
<td class="double_precision">445662</td>
<td class="double_precision">445662</td>
<td class="varchar2">Value 785388</td>
</tr>
<tr>
<td class="double_precision">785389</td>
<td class="double_precision">157077</td>
<td class="double_precision">445663</td>
<td class="double_precision">445663</td>
<td class="varchar2">Value 785389</td>
</tr>
<tr>
<td class="double_precision">785390</td>
<td class="double_precision">157077</td>
<td class="double_precision">445664</td>
<td class="double_precision">445664</td>
<td class="varchar2">Value 785390</td>
</tr>
<tr>
<td class="double_precision">157078</td>
<td class="double_precision">31415</td>
<td class="double_precision">445666</td>
<td class="double_precision">445672</td>
<td class="varchar2">Value 157078</td>
</tr>
<tr>
<td class="double_precision">785391</td>
<td class="double_precision">157078</td>
<td class="double_precision">445667</td>
<td class="double_precision">445667</td>
<td class="varchar2">Value 785391</td>
</tr>
<tr>
<td class="double_precision">785392</td>
<td class="double_precision">157078</td>
<td class="double_precision">445668</td>
<td class="double_precision">445668</td>
<td class="varchar2">Value 785392</td>
</tr>
<tr>
<td class="double_precision">785393</td>
<td class="double_precision">157078</td>
<td class="double_precision">445669</td>
<td class="double_precision">445669</td>
<td class="varchar2">Value 785393</td>
</tr>
<tr>
<td class="double_precision">785394</td>
<td class="double_precision">157078</td>
<td class="double_precision">445670</td>
<td class="double_precision">445670</td>
<td class="varchar2">Value 785394</td>
</tr>
<tr>
<td class="double_precision">785395</td>
<td class="double_precision">157078</td>
<td class="double_precision">445671</td>
<td class="double_precision">445671</td>
<td class="varchar2">Value 785395</td>
</tr>
<tr>
<td class="double_precision">157079</td>
<td class="double_precision">31415</td>
<td class="double_precision">445673</td>
<td class="double_precision">445679</td>
<td class="varchar2">Value 157079</td>
</tr>
<tr>
<td class="double_precision">785396</td>
<td class="double_precision">157079</td>
<td class="double_precision">445674</td>
<td class="double_precision">445674</td>
<td class="varchar2">Value 785396</td>
</tr>
<tr>
<td class="double_precision">785397</td>
<td class="double_precision">157079</td>
<td class="double_precision">445675</td>
<td class="double_precision">445675</td>
<td class="varchar2">Value 785397</td>
</tr>
<tr>
<td class="double_precision">785398</td>
<td class="double_precision">157079</td>
<td class="double_precision">445676</td>
<td class="double_precision">445676</td>
<td class="varchar2">Value 785398</td>
</tr>
<tr>
<td class="double_precision">785399</td>
<td class="double_precision">157079</td>
<td class="double_precision">445677</td>
<td class="double_precision">445677</td>
<td class="varchar2">Value 785399</td>
</tr>
<tr>
<td class="double_precision">785400</td>
<td class="double_precision">157079</td>
<td class="double_precision">445678</td>
<td class="double_precision">445678</td>
<td class="varchar2">Value 785400</td>
</tr>
<tr>
<td class="double_precision">157080</td>
<td class="double_precision">31415</td>
<td class="double_precision">445680</td>
<td class="double_precision">445686</td>
<td class="varchar2">Value 157080</td>
</tr>
<tr>
<td class="double_precision">785401</td>
<td class="double_precision">157080</td>
<td class="double_precision">445681</td>
<td class="double_precision">445681</td>
<td class="varchar2">Value 785401</td>
</tr>
<tr>
<td class="double_precision">785402</td>
<td class="double_precision">157080</td>
<td class="double_precision">445682</td>
<td class="double_precision">445682</td>
<td class="varchar2">Value 785402</td>
</tr>
<tr>
<td class="double_precision">785403</td>
<td class="double_precision">157080</td>
<td class="double_precision">445683</td>
<td class="double_precision">445683</td>
<td class="varchar2">Value 785403</td>
</tr>
<tr>
<td class="double_precision">785404</td>
<td class="double_precision">157080</td>
<td class="double_precision">445684</td>
<td class="double_precision">445684</td>
<td class="varchar2">Value 785404</td>
</tr>
<tr>
<td class="double_precision">785405</td>
<td class="double_precision">157080</td>
<td class="double_precision">445685</td>
<td class="double_precision">445685</td>
<td class="varchar2">Value 785405</td>
</tr>
<tr class="statusbar">
<td colspan="100">31 rows fetched in 0.0015s (0.0011s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 CONNECT BY WITH FILTERING
  TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
   INDEX UNIQUE SCAN, 20090928_nested.PK_HIERARCHY_ID
  NESTED LOOPS
   BUFFER SORT
    CONNECT BY PUMP
   FILTER
    TABLE ACCESS BY INDEX ROWID, 20090928_nested.T_HIERARCHY
     INDEX RANGE SCAN, 20090928_nested.IX_HIERARCHY_PARENT
  TABLE ACCESS FULL, 20090928_nested.T_HIERARCHY
</pre>
</div>
<p>Both queries complete in <strong>1 ms</strong>, that is instant.</p>
<h3>Summary</h3>
<p><strong>Oracle</strong> implements a special clause, <code>CONNECT BY</code>, that makes traversing adjacency lists in general and parent-child relationships in particular very easy, since this is exactly what this clause was intended for.</p>
<p>Except for finding all descendants of a single node (without finding out or limiting the depth), nested sets model is less efficient in <strong>Oracle</strong> than adjacency lists.</p>
<p>Adjacency lists are not only more efficient but the queries using them are more elegant and easy to maintain due to the native support.</p>
<p>That&#8217;s why <strong>adjacency list model</strong> should be used instead of nested sets model in <strong>Oracle</strong> too, as well as in <a href="/2009/09/25/adjacency-list-vs-nested-sets-sql-server/"><strong>SQL Server</strong></a> and <a href="/2009/09/24/adjacency-list-vs-nested-sets-postgresql/"><strong>PostgreSQL 8.4</strong></a>.</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/09/28/adjacency-list-vs-nested-sets-oracle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building alphabetical index</title>
		<link>http://explainextended.com/2009/09/23/building-alphabetical-index/</link>
		<comments>http://explainextended.com/2009/09/23/building-alphabetical-index/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 19:00:57 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3157</guid>
		<description><![CDATA[Answering questions asked on the site. Cora asks: I want to build an alphabetical index over the articles in my database. I need to show 5 entries in the index, like this: A-D, E-F etc., so that each entry contains roughly equal number of articles. How do I do it in a single query? This [...]]]></description>
			<content:encoded><![CDATA[<p>Answering questions asked on the site.</p>
<p><strong>Cora</strong> asks:</p>
<blockquote><p>I want to build an alphabetical index over the articles in my database.</p>
<p>I need to show <strong>5</strong> entries in the index, like this: <code>A-D</code>, <code>E-F</code> etc., so that each entry contains roughly equal number of articles.</p>
<p>How do I do it in a single query?</p>
<p>This is in <strong>Oracle</strong>
</p></blockquote>
<p>This is a good task to demonstrate <strong>Oracle</strong>&#8216;s analytic abilities.</p>
<p>To do this, we will create a sample table:<br />
<span id="more-3157"></span><br />
<a href="#" onclick="xcollapse('X10201');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X10201" style="display: none; ">
<pre class="brush: sql">
BEGIN
        DBMS_RANDOM.seed(20090923);
END;
/
CREATE TABLE t_name (id NOT NULL PRIMARY KEY, name NOT NULL)
AS
SELECT  level, CAST(SUBSTR(&#039;ETAOINSHRDLUCMFWYPVBGKQJXZ&#039;, 1 + FLOOR(LOG(0.85, POWER(0.85, 26) + DBMS_RANDOM.value() * (1 - POWER(0.85, 26)))), 1) || &#039; article &#039; || level AS VARCHAR(50)) AS name
FROM    dual
CONNECT BY
        level &lt;= 10000
/
CREATE INDEX ix_name_name ON t_name (name)
/
</pre>
</div>
<p>This script creates <strong>10,000</strong> random articles with the names distributed logarithmically, according to <a href="http://en.wikipedia.org/wiki/ETAOIN_SHRDLU">ETAOIN SHRDLU</a>.</p>
<p>Here&#8217;s the distribution:</p>
<p><a href="#" onclick="xcollapse('X3539');return false;"><strong>View the distribution of initial letters</strong></a><br />
</p>
<div id="X3539" style="display: none; ">
<pre class="brush: sql">
SELECT  SUBSTR(name, 1, 1) AS letter, COUNT(*) AS cnt
FROM    t_name
GROUP BY
        SUBSTR(name, 1, 1)
ORDER BY
        cnt DESC
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>LETTER</th>
<th>CNT</th>
</tr>
<tr>
<td class="varchar2">E</td>
<td class="double_precision">1583</td>
</tr>
<tr>
<td class="varchar2">T</td>
<td class="double_precision">1213</td>
</tr>
<tr>
<td class="varchar2">A</td>
<td class="double_precision">1149</td>
</tr>
<tr>
<td class="varchar2">O</td>
<td class="double_precision">908</td>
</tr>
<tr>
<td class="varchar2">I</td>
<td class="double_precision">757</td>
</tr>
<tr>
<td class="varchar2">N</td>
<td class="double_precision">679</td>
</tr>
<tr>
<td class="varchar2">S</td>
<td class="double_precision">570</td>
</tr>
<tr>
<td class="varchar2">H</td>
<td class="double_precision">505</td>
</tr>
<tr>
<td class="varchar2">R</td>
<td class="double_precision">423</td>
</tr>
<tr>
<td class="varchar2">D</td>
<td class="double_precision">333</td>
</tr>
<tr>
<td class="varchar2">L</td>
<td class="double_precision">316</td>
</tr>
<tr>
<td class="varchar2">U</td>
<td class="double_precision">264</td>
</tr>
<tr>
<td class="varchar2">C</td>
<td class="double_precision">236</td>
</tr>
<tr>
<td class="varchar2">M</td>
<td class="double_precision">180</td>
</tr>
<tr>
<td class="varchar2">F</td>
<td class="double_precision">166</td>
</tr>
<tr>
<td class="varchar2">W</td>
<td class="double_precision">122</td>
</tr>
<tr>
<td class="varchar2">Y</td>
<td class="double_precision">117</td>
</tr>
<tr>
<td class="varchar2">P</td>
<td class="double_precision">98</td>
</tr>
<tr>
<td class="varchar2">B</td>
<td class="double_precision">78</td>
</tr>
<tr>
<td class="varchar2">V</td>
<td class="double_precision">65</td>
</tr>
<tr>
<td class="varchar2">G</td>
<td class="double_precision">54</td>
</tr>
<tr>
<td class="varchar2">K</td>
<td class="double_precision">52</td>
</tr>
<tr>
<td class="varchar2">J</td>
<td class="double_precision">41</td>
</tr>
<tr>
<td class="varchar2">Q</td>
<td class="double_precision">36</td>
</tr>
<tr>
<td class="varchar2">X</td>
<td class="double_precision">32</td>
</tr>
<tr>
<td class="varchar2">Z</td>
<td class="double_precision">23</td>
</tr>
</table>
</div>
</div>
<p>To split the letters into <strong>5</strong> <em>tomes</em> (contiguous letter sets), we should calculate the aggregated sum of the articles starting with each letters, divide it by overall sum of them articles and distribute the letters according to this value.</p>
<p>This can be done with <strong>Oracle</strong>&#8216;s ability to calculate partial sums and groupwise sums without actual grouping.</p>
<p>To calculate partial sums, we should use <code>SUM(cnt) OVER (ORDER BY letter)</code>. This will calculate and accumulate the count of articles starting from each letter.</p>
<p>To calculate total sum and assign it to each row, we just use <code>SUM(cnt) OVER()</code>, without <code>ORDER BY</code> clause. This is an equivalent of an aggregate <code>SUM</code>, but is returned for each row.</p>
<p>Here are the results:</p>
<p><a href="#" onclick="xcollapse('X3171');return false;"><strong>View the partial and total sums and their ratios</strong></a><br />
</p>
<div id="X3171" style="display: none; ">
<pre class="brush: sql">
SELECT  letter, cnt,
        SUM(cnt) OVER (ORDER BY letter) AS partial_sum,
        SUM(cnt) OVER () AS total_sum,
        TO_CHAR(SUM(cnt) OVER (ORDER BY letter) / SUM(cnt) OVER (), &#039;FM0.990&#039;) AS partial_ratio
FROM    (
        SELECT  SUBSTR(name, 1, 1) AS letter, COUNT(*) AS cnt
        FROM    t_name
        GROUP BY
                SUBSTR(name, 1, 1)
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>LETTER</th>
<th>CNT</th>
<th>PARTIAL_SUM</th>
<th>TOTAL_SUM</th>
<th>PARTIAL_RATIO</th>
</tr>
<tr>
<td class="varchar2">A</td>
<td class="double_precision">1149</td>
<td class="double_precision">1149</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.115</td>
</tr>
<tr>
<td class="varchar2">B</td>
<td class="double_precision">78</td>
<td class="double_precision">1227</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.123</td>
</tr>
<tr>
<td class="varchar2">C</td>
<td class="double_precision">236</td>
<td class="double_precision">1463</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.146</td>
</tr>
<tr>
<td class="varchar2">D</td>
<td class="double_precision">333</td>
<td class="double_precision">1796</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.180</td>
</tr>
<tr>
<td class="varchar2">E</td>
<td class="double_precision">1583</td>
<td class="double_precision">3379</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.338</td>
</tr>
<tr>
<td class="varchar2">F</td>
<td class="double_precision">166</td>
<td class="double_precision">3545</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.355</td>
</tr>
<tr>
<td class="varchar2">G</td>
<td class="double_precision">54</td>
<td class="double_precision">3599</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.360</td>
</tr>
<tr>
<td class="varchar2">H</td>
<td class="double_precision">505</td>
<td class="double_precision">4104</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.410</td>
</tr>
<tr>
<td class="varchar2">I</td>
<td class="double_precision">757</td>
<td class="double_precision">4861</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.486</td>
</tr>
<tr>
<td class="varchar2">J</td>
<td class="double_precision">41</td>
<td class="double_precision">4902</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.490</td>
</tr>
<tr>
<td class="varchar2">K</td>
<td class="double_precision">52</td>
<td class="double_precision">4954</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.495</td>
</tr>
<tr>
<td class="varchar2">L</td>
<td class="double_precision">316</td>
<td class="double_precision">5270</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.527</td>
</tr>
<tr>
<td class="varchar2">M</td>
<td class="double_precision">180</td>
<td class="double_precision">5450</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.545</td>
</tr>
<tr>
<td class="varchar2">N</td>
<td class="double_precision">679</td>
<td class="double_precision">6129</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.613</td>
</tr>
<tr>
<td class="varchar2">O</td>
<td class="double_precision">908</td>
<td class="double_precision">7037</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.704</td>
</tr>
<tr>
<td class="varchar2">P</td>
<td class="double_precision">98</td>
<td class="double_precision">7135</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.714</td>
</tr>
<tr>
<td class="varchar2">Q</td>
<td class="double_precision">36</td>
<td class="double_precision">7171</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.717</td>
</tr>
<tr>
<td class="varchar2">R</td>
<td class="double_precision">423</td>
<td class="double_precision">7594</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.759</td>
</tr>
<tr>
<td class="varchar2">S</td>
<td class="double_precision">570</td>
<td class="double_precision">8164</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.816</td>
</tr>
<tr>
<td class="varchar2">T</td>
<td class="double_precision">1213</td>
<td class="double_precision">9377</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.938</td>
</tr>
<tr>
<td class="varchar2">U</td>
<td class="double_precision">264</td>
<td class="double_precision">9641</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.964</td>
</tr>
<tr>
<td class="varchar2">V</td>
<td class="double_precision">65</td>
<td class="double_precision">9706</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.971</td>
</tr>
<tr>
<td class="varchar2">W</td>
<td class="double_precision">122</td>
<td class="double_precision">9828</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.983</td>
</tr>
<tr>
<td class="varchar2">X</td>
<td class="double_precision">32</td>
<td class="double_precision">9860</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.986</td>
</tr>
<tr>
<td class="varchar2">Y</td>
<td class="double_precision">117</td>
<td class="double_precision">9977</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.998</td>
</tr>
<tr>
<td class="varchar2">Z</td>
<td class="double_precision">23</td>
<td class="double_precision">10000</td>
<td class="double_precision">10000</td>
<td class="varchar2">1.000</td>
</tr>
</table>
</div>
</div>
<p>Now we should just take each value and assign it to the closest quotient of <strong>5</strong>: <code>0.0</code>, <code>0.2</code>, <code>0.4</code> etc.</p>
<p>However, there is a problem.</p>
<p>The query above returns the following results for <strong>D</strong> and <strong>E</strong>:</p>
<div class="terminal">
<table class="terminal">
<tr>
<th>LETTER</th>
<th>CNT</th>
<th>PARTIAL_SUM</th>
<th>TOTAL_SUM</th>
<th>PARTIAL_RATIO</th>
</tr>
<tr>
<td class="varchar2">D</td>
<td class="double_precision">333</td>
<td class="double_precision">1796</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.180</td>
</tr>
<tr>
<td class="varchar2">E</td>
<td class="double_precision">1583</td>
<td class="double_precision">3379</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.338</td>
</tr>
</table>
</div>
<p><code>0.0</code> means tome <strong>1</strong>, <code>0.2</code> means tome <strong>2</strong>.</p>
<p><code>0.2</code> is inside the <strong>E</strong> range which is from <code>0.180</code> to <code>0.338</code>.</p>
<p>Should we assign <strong>E</strong> to tome <strong>1</strong> or tome <strong>2</strong>?</p>
<p>Assigning <strong>E</strong> to tome <strong>1</strong> makes it <code>0.338</code> articles long or <code>0.138</code> articles longer than it should be, while assigning to tome <strong>2</strong> makes it only <code>0.020</code> articles shorter.</p>
<p>It seems that the second option is more fair, since the difference is <code>0.020</code> vs. <code>0.138</code> in the first case.</p>
<p>So to calculate the tome the letter should go to, we should minimize the difference.</p>
<p>To do this, we should take the average of the previous and the current value of ratio (in this case, it would be <code>(0.338 + 0.180) / 2 = 0.259</code>) and round it down to the nearest quotient (<code>0.2</code>).</p>
<p>To calculate the difference between the rows, one could use <code>LAG</code> function. However, the analytic function cannot be nested and this would require nesting the whole query.</p>
<p>But there is a more elegant way.</p>
<p>Remember the algorithm the ratios are calculated with? They are the <strong>partial sums</strong> divided by the <strong>total sum</strong>.</p>
<p>The average of two consecutive ratios is the sum of all values but the current one plus the sum of all values (including the current one) divided by two and divided by the total sum:</p>
<p><code>fair_ratio = AVG(partial_ratio, previous_partial_ratio) = SUM(1:N-1) + SUM(1:N) / 2 * SUM()</code></p>
<p>But the sum of all values including the current one is in fact the sum of all preceding values plus the current value.</p>
<p>So to get the average of the ratios, we can just take the sum of all preceding values <strong>excluding</strong> the current one and add a <em>half</em> of the current value to it.</p>
<p>And <strong>Oracle</strong> allows us to calculate the sum of all values but the current one merely by adding the <code>RANGE</code> clause to the analytical function. This expression:</p>
<p><code>SUM(cnt) OVER (ORDER BY letter RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) + cnt / 2</code></p>
<p>returns the average of the current and the previous partial sums by calculating the previous partial sum and adding half the current value to it.</p>
<p>Let&#8217;s use it in a query:</p>
<p><a href="#" onclick="xcollapse('X5618');return false;"><strong>View the query with fair ratios</strong></a><br />
</p>
<div id="X5618" style="display: none; ">
<pre class="brush: sql">
SELECT  letter, cnt,
        SUM(cnt) OVER (ORDER BY letter) AS partial_sum,
        SUM(cnt) OVER () AS total_sum,
        TO_CHAR(SUM(cnt) OVER (ORDER BY letter) / SUM(cnt) OVER (), &#039;FM0.990&#039;) AS partial_ratio,
        TO_CHAR((COALESCE(SUM(cnt) OVER (ORDER BY letter ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) + cnt / 2) / SUM(cnt) OVER (), &#039;FM0.990&#039;) AS fair_ratio
FROM    (
        SELECT  SUBSTR(name, 1, 1) AS letter, COUNT(*) AS cnt
        FROM    t_name
        GROUP BY
                SUBSTR(name, 1, 1)
        )
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>LETTER</th>
<th>CNT</th>
<th>PARTIAL_SUM</th>
<th>TOTAL_SUM</th>
<th>PARTIAL_RATIO</th>
<th>FAIR_RATIO</th>
</tr>
<tr>
<td class="varchar2">A</td>
<td class="double_precision">1149</td>
<td class="double_precision">1149</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.115</td>
<td class="varchar2">0.057</td>
</tr>
<tr>
<td class="varchar2">B</td>
<td class="double_precision">78</td>
<td class="double_precision">1227</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.123</td>
<td class="varchar2">0.119</td>
</tr>
<tr>
<td class="varchar2">C</td>
<td class="double_precision">236</td>
<td class="double_precision">1463</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.146</td>
<td class="varchar2">0.135</td>
</tr>
<tr>
<td class="varchar2">D</td>
<td class="double_precision">333</td>
<td class="double_precision">1796</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.180</td>
<td class="varchar2">0.163</td>
</tr>
<tr>
<td class="varchar2">E</td>
<td class="double_precision">1583</td>
<td class="double_precision">3379</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.338</td>
<td class="varchar2">0.259</td>
</tr>
<tr>
<td class="varchar2">F</td>
<td class="double_precision">166</td>
<td class="double_precision">3545</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.355</td>
<td class="varchar2">0.346</td>
</tr>
<tr>
<td class="varchar2">G</td>
<td class="double_precision">54</td>
<td class="double_precision">3599</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.360</td>
<td class="varchar2">0.357</td>
</tr>
<tr>
<td class="varchar2">H</td>
<td class="double_precision">505</td>
<td class="double_precision">4104</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.410</td>
<td class="varchar2">0.385</td>
</tr>
<tr>
<td class="varchar2">I</td>
<td class="double_precision">757</td>
<td class="double_precision">4861</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.486</td>
<td class="varchar2">0.448</td>
</tr>
<tr>
<td class="varchar2">J</td>
<td class="double_precision">41</td>
<td class="double_precision">4902</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.490</td>
<td class="varchar2">0.488</td>
</tr>
<tr>
<td class="varchar2">K</td>
<td class="double_precision">52</td>
<td class="double_precision">4954</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.495</td>
<td class="varchar2">0.493</td>
</tr>
<tr>
<td class="varchar2">L</td>
<td class="double_precision">316</td>
<td class="double_precision">5270</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.527</td>
<td class="varchar2">0.511</td>
</tr>
<tr>
<td class="varchar2">M</td>
<td class="double_precision">180</td>
<td class="double_precision">5450</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.545</td>
<td class="varchar2">0.536</td>
</tr>
<tr>
<td class="varchar2">N</td>
<td class="double_precision">679</td>
<td class="double_precision">6129</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.613</td>
<td class="varchar2">0.579</td>
</tr>
<tr>
<td class="varchar2">O</td>
<td class="double_precision">908</td>
<td class="double_precision">7037</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.704</td>
<td class="varchar2">0.658</td>
</tr>
<tr>
<td class="varchar2">P</td>
<td class="double_precision">98</td>
<td class="double_precision">7135</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.714</td>
<td class="varchar2">0.709</td>
</tr>
<tr>
<td class="varchar2">Q</td>
<td class="double_precision">36</td>
<td class="double_precision">7171</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.717</td>
<td class="varchar2">0.715</td>
</tr>
<tr>
<td class="varchar2">R</td>
<td class="double_precision">423</td>
<td class="double_precision">7594</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.759</td>
<td class="varchar2">0.738</td>
</tr>
<tr>
<td class="varchar2">S</td>
<td class="double_precision">570</td>
<td class="double_precision">8164</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.816</td>
<td class="varchar2">0.788</td>
</tr>
<tr>
<td class="varchar2">T</td>
<td class="double_precision">1213</td>
<td class="double_precision">9377</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.938</td>
<td class="varchar2">0.877</td>
</tr>
<tr>
<td class="varchar2">U</td>
<td class="double_precision">264</td>
<td class="double_precision">9641</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.964</td>
<td class="varchar2">0.951</td>
</tr>
<tr>
<td class="varchar2">V</td>
<td class="double_precision">65</td>
<td class="double_precision">9706</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.971</td>
<td class="varchar2">0.967</td>
</tr>
<tr>
<td class="varchar2">W</td>
<td class="double_precision">122</td>
<td class="double_precision">9828</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.983</td>
<td class="varchar2">0.977</td>
</tr>
<tr>
<td class="varchar2">X</td>
<td class="double_precision">32</td>
<td class="double_precision">9860</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.986</td>
<td class="varchar2">0.984</td>
</tr>
<tr>
<td class="varchar2">Y</td>
<td class="double_precision">117</td>
<td class="double_precision">9977</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.998</td>
<td class="varchar2">0.992</td>
</tr>
<tr>
<td class="varchar2">Z</td>
<td class="double_precision">23</td>
<td class="double_precision">10000</td>
<td class="double_precision">10000</td>
<td class="varchar2">1.000</td>
<td class="varchar2">0.999</td>
</tr>
</table>
</div>
</div>
<p>We see that for letter <strong>E</strong> the <code>fair_ratio</code> is <code>0.259</code>, which is the average of the <code>partial_ratios</code> from the preceding and the current rows:</p>
<div class="terminal">
<table class="terminal">
<tr>
<th>LETTER</th>
<th>CNT</th>
<th>PARTIAL_SUM</th>
<th>TOTAL_SUM</th>
<th>PARTIAL_RATIO</th>
<th>FAIR_RATIO</th>
</tr>
<tr>
<td class="varchar2">D</td>
<td class="double_precision">333</td>
<td class="double_precision">1796</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.180</td>
<td class="varchar2">0.163</td>
</tr>
<tr>
<td class="varchar2">E</td>
<td class="double_precision">1583</td>
<td class="double_precision">3379</td>
<td class="double_precision">10000</td>
<td class="varchar2">0.338</td>
<td class="varchar2">0.259</td>
</tr>
</table>
</div>
<p>Now we cal easily distribute the letters between the tomes.</p>
<p>To do this, we should just multiply the <code>fair_ratio</code> by <strong>5</strong> and round it down to the closest integer using <code>FLOOR</code>. This will give us the tome number.</p>
<p>The first and the last letters of each tome can be easily calculated using <code>MIN</code> and <code>MAX</code>.</p>
<p>Here&#8217;s the query to do this:</p>
<pre class="brush: sql">
SELECT  tome, MIN(letter) || &#039;-&#039; || MAX(letter) AS range, SUM(cnt)
FROM    (
        SELECT  letter, cnt,
                TRUNC((COALESCE(SUM(cnt) OVER (ORDER BY letter ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) + cnt / 2) / SUM(cnt) OVER () * 5) + 1 AS tome
        FROM    (
                SELECT  SUBSTR(name, 1, 1) AS letter, COUNT(*) AS cnt
                FROM    t_name
                GROUP BY
                        SUBSTR(name, 1, 1)
                )
        )
GROUP BY
        tome
ORDER BY
        tome
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>TOME</th>
<th>RANGE</th>
<th>SUM(CNT)</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="varchar2">A-D</td>
<td class="double_precision">1796</td>
</tr>
<tr>
<td class="double_precision">2</td>
<td class="varchar2">E-H</td>
<td class="double_precision">2308</td>
</tr>
<tr>
<td class="double_precision">3</td>
<td class="varchar2">I-N</td>
<td class="double_precision">2025</td>
</tr>
<tr>
<td class="double_precision">4</td>
<td class="varchar2">O-S</td>
<td class="double_precision">2035</td>
</tr>
<tr>
<td class="double_precision">5</td>
<td class="varchar2">T-Z</td>
<td class="double_precision">1836</td>
</tr>
</table>
</div>
<p>As you can see, the distribution is reasonably fair.</p>
<p>Hope that helps.</p>
<hr/>
<p>I&#8217;m always glad to answer the questions regarding database queries.</p>
<p><a href="/ask-a-question"><strong>Ask me a question</strong></a></p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/09/23/building-alphabetical-index/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/09/23/building-alphabetical-index/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/09/23/building-alphabetical-index/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle</title>
		<link>http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/</link>
		<comments>http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 19:00:41 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=3053</guid>
		<description><![CDATA[Which method is best to select values present in one table but missing in another one? This: SELECT l.* FROM t_left l LEFT JOIN t_right r ON r.value = l.value WHERE r.value IS NULL , this: SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right r ) or this: SELECT [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>Which method is best to select values present in one table but missing in another one?</p>
<p>This:</p>
<pre class="brush: sql">
SELECT  l.*
FROM    t_left l
LEFT JOIN
        t_right r
ON      r.value = l.value
WHERE   r.value IS NULL
</pre>
<p>, this:</p>
<pre class="brush: sql">
SELECT  l.*
FROM    t_left l
WHERE   l.value NOT IN
        (
        SELECT  value
        FROM    t_right r
        )
</pre>
<p>or this:</p>
<pre class="brush: sql">
SELECT  l.*
FROM    t_left l
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    t_right r
        WHERE   r.value = l.value
        )
</pre>
</blockquote>
<p>Today, we will see how Oracle copes with these queries.</p>
<p>And to do this, we, of course, should create sample tables:</p>
<p><span id="more-3053"></span><br />
<a href="#" onclick="xcollapse('X561');return false;"><strong>Table creation script</strong></a><br />
</p>
<div id="X561" style="display: none; ">
<pre class="brush: sql">
CREATE TABLE t_left (id NOT NULL PRIMARY KEY, value NOT NULL, stuffing NOT NULL)
AS
SELECT  level, MOD(level, 10000), RPAD(&#039;Value &#039; || level || &#039; &#039;, 200, &#039;*&#039;)
FROM    dual
CONNECT BY
        level &lt;= 100000
/

CREATE TABLE t_right (id NOT NULL PRIMARY KEY, value NOT NULL, stuffing NOT NULL) AS
SELECT  level, MOD(level, 10000) + 1, CAST(RPAD(&#039;Value &#039; || level || &#039; &#039;, 200, &#039;*&#039;) AS VARCHAR2(200))
FROM    dual
CONNECT BY
        level &lt;= 1000000
/

CREATE INDEX ix_left_value ON t_left (value)
/

CREATE INDEX ix_right_value ON t_right (value)
/
</pre>
</div>
<p>Table <code>t_left</code> contains <strong>100,000</strong> rows with <strong>10,000</strong> distinct values.</p>
<p>Table <code>t_right</code> contains <strong>1,000,000</strong> rows with <strong>10,000</strong> distinct values.</p>
<p>There are <strong>10</strong> rows in <code>t_left</code> with values not present in <code>t_right</code>.</p>
<p>In both tables the field <code>value</code> is indexed.</p>
<h3>LEFT JOIN / IS NULL</h3>
<pre class="brush: sql">
SELECT  l.value, l.id
FROM    t_left l
LEFT JOIN
        t_right r
ON      r.value = l.value
WHERE   r.value IS NULL
</pre>
<p><a href="#" onclick="xcollapse('X8073');return false;"><strong>View query results and execution plan</strong></a><br />
</p>
<div id="X8073" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">100000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">90000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">80000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">70000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">60000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">50000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">40000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">30000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">20000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">10000</td>
<td class="double_precision">0</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.2781s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 HASH JOIN ANTI
  VIEW , 20090917_anti.index$_join$_001
   HASH JOIN
    INDEX FAST FULL SCAN, 20090917_anti.SYS_C0013641
    INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
  INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
</pre>
</div>
<p>This execution plan is quite interesting.</p>
<p>First, <strong>Oracle</strong>&#8216;s optimizer, unlike <strong>SQL Server</strong>&#8216;s one, is smart enough to see an opportunity to use <code>ANTI JOIN</code> for such a query.</p>
<p>Since all rows from t_left should be examined, <strong>Oracle</strong> decided to use a <code>HASH ANTI JOIN</code> to do this: a hash table is built over the values from <code>t_right</code>, eliminating duplicates, and every row from <code>t_left</code> is searched for in the hash table. Rows not found in the hash table are returned; those that are found are not.</p>
<p>Second, <strong>Oracle</strong> decided to optimize the query yet a little.</p>
<p>In the table we actually have three fields: <code>id</code> (a <code>PRIMARY KEY</code>), <code>value</code> we are searching for, and <code>stuffing</code>. The latter field is a <code>VARCHAR(200)</code> and filled with asterisks. This field emulates values in actual tables that makes the tables large in size.</p>
<p>Since the tables in <strong>Oracle</strong> are heap organized by default, no index contains values of both <code>id</code> and <code>value</code>.</p>
<p>However, there are two indexes that contain them separately: the index on <code>value</code> created explicitly, and the index on <code>id</code> created implicitly to police the <code>PRIMARY KEY</code>.</p>
<p>This indexes, even taken together, are still much smaller than the table itself. And that&#8217;s why <strong>Oracle</strong> decided to retrieve the values not from the table but from the indexes joined on <code>rowid</code>.</p>
<p>Indeed, each of the index records corresponds to a record in the table. And index record contains a <code>rowid</code>: a pointer to the original record in the table. Having equal <code>rowid</code>&#8216;s means belonging to the same row. And <code>rowid</code> is of course unique in each index, since one table record corresponds to exactly one index record.</p>
<p>Since we don&#8217;t use other columns except <code>id</code> and <code>value</code>, these two indexes contain all information we need. And joining them, though takes some extra time, is still faster than scanning the whole table (which as we know is quite large and occupies a lot of data blocks).</p>
<p>The <code>HASH JOIN</code> we see in the plan does the things described above. Here are the steps it performs:</p>
<ol>
<li>First, it scans one of the two indexes with <code>FAST FULL SCAN</code>. This scan methods is faster since it does not follow the <code>B-Tree</code> links but instead just reads all index pages sequentially. This does not preserve the index order but we don&#8217;t need it anyway.</li>
<li>Then it builds a hash table from the index records read on step <strong>1</strong> (using <code>ROWID</code>&#8216;s of the records as a hash condition)</li>
<li>Finally, it reads the second index (again, with a <code>FAST FULL SCAN</code>) and probes it against the hash table</li>
</ol>
<p>This results in the same rowset of <code>(id, value)</code> we would obtain had we scanned the table, only the <code>HASH JOIN</code> method is faster.</p>
<p>The query completes in <strong>0.27 s</strong>.</p>
<h3>NOT IN</h3>
<pre class="brush: sql">
SELECT  l.id, l.value
FROM    t_left l
WHERE   value NOT IN
        (
        SELECT  value
        FROM    t_right
        )
</pre>
<p><a href="#" onclick="xcollapse('X8074');return false;"><strong>View query results and execution plan</strong></a><br />
</p>
<div id="X8074" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">100000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">90000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">80000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">70000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">60000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">50000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">40000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">30000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">20000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">10000</td>
<td class="double_precision">0</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.2807s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 HASH JOIN ANTI
  VIEW , 20090917_anti.index$_join$_001
   HASH JOIN
    INDEX FAST FULL SCAN, 20090917_anti.PK_LEFT_ID
    INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
  INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
</pre>
</div>
<p><code>NOT IN</code> is exactly what <code>ANTI JOIN</code> is designed for, that&#8217;s why it&#8217;s no surprise to see that <strong>Oracle</strong> uses <code>ANTI JOIN</code> for <code>NOT IN</code>.</p>
<p><code>NOT IN</code> is semantically different from the <code>LEFT JOIN / IS NULL</code> and <code>NOT EXISTS</code> since its logic is trivalent and filtering on it will never return anything if there are <code>NULL</code> values in the list.</p>
<p>This behaviour was explained in the first article of the series (that concerning <a href="/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/"><strong>SQL Server</strong></a>) and this is a reason why <a href="/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/"><strong>PostgreSQL</strong></a> is reluctant to use an <code>Anti Join</code> for <code>NOT IN</code>.</p>
<p>However, <strong>Oracle</strong> is able to take into account the fact that <code>t_right.value</code> is declared as <code>NOT NULL</code>, and, therefore, <code>NOT IN</code> is semantically equivalent to <code>LEFT JOIN / IS NULL</code> and <code>NOT EXISTS</code>. And <strong>Oracle</strong> uses exactly same plan for <code>NOT IN</code>, with an <code>ANTI JOIN</code> and a <code>HASH JOIN</code> to get <code>(id, value)</code> for <code>t_left</code>.</p>
<p>The query completes in <strong>0.28 s</strong>, same as for <code>LEFT JOIN / IS NULL</code>.</p>
<h3>NOT EXISTS</h3>
<pre class="brush: sql">
SELECT  l.id, l.value
FROM    t_left l
WHERE   NOT EXISTS
        (
        SELECT  value
        FROM    t_right r
        WHERE   r.value = l.value
        )
</pre>
<p><a href="#" onclick="xcollapse('X8075');return false;"><strong>View query results and execution plan</strong></a><br />
</p>
<div id="X8075" style="display: none; ">
<div class="terminal">
<table class="terminal">
<tr>
<th>ID</th>
<th>VALUE</th>
</tr>
<tr>
<td class="double_precision">100000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">90000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">80000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">70000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">60000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">50000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">40000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">30000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">20000</td>
<td class="double_precision">0</td>
</tr>
<tr>
<td class="double_precision">10000</td>
<td class="double_precision">0</td>
</tr>
<tr class="statusbar">
<td colspan="100">10 rows fetched in 0.0003s (0.2788s)</td>
</tr>
</table>
</div>
<pre>
SELECT STATEMENT
 HASH JOIN ANTI
  VIEW , 20090917_anti.index$_join$_001
   HASH JOIN
    INDEX FAST FULL SCAN, 20090917_anti.PK_LEFT_ID
    INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
  INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
</pre>
</div>
<p>Guess what?</p>
<p>Same execution plan, same results, same performance.</p>
<p>The query completes in <strong>0.28 s</strong>.</p>
<h3>Summary</h3>
<p><strong>Oracle</strong>&#8216;s optimizer is able to see that <code>NOT EXISTS</code>, <code>NOT IN</code> and <code>LEFT JOIN / IS NULL</code> are semantically equivalent as long as the list values are declared as <code>NOT NULL</code>.</p>
<p>It uses same execution plan for all three methods, and they yield same results in same time.</p>
<p>In <strong>Oracle</strong>, it is safe to use any method of the three described above to select values from a table that are missing in another table.</p>
<p>However, if the values are not guaranteed to be <code>NOT NULL</code>, <code>LEFT JOIN / IS NULL</code> or <code>NOT EXISTS</code> should be used rather than <code>NOT IN</code>, since the latter will produce different results depending on whether or not there are <code>NULL</code> values in the subquery resultset.</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle: generating a list of dates and counting ranges for each date</title>
		<link>http://explainextended.com/2009/09/09/oracle-generating-a-list-of-dates-and-counting-ranges-for-each-date/</link>
		<comments>http://explainextended.com/2009/09/09/oracle-generating-a-list-of-dates-and-counting-ranges-for-each-date/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 19:00:04 +0000</pubDate>
		<dc:creator>Quassnoi</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://explainextended.com/?p=2947</guid>
		<description><![CDATA[From Stack Overflow: I have a table with data such as below: Group Start Date End Date A 2001.01.01 2001.01.03 A 2001.01.01 2001.01.02 A 2001.01.03 2001.01.03 B 2001.01.01 2001.01.01 I am looking to produce a view that gives a count for each day We have a list of ranges here, and for each date we [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://stackoverflow.com/questions/1394153/count-number-of-rows-that-occur-for-each-date-in-column-date-range"><strong>Stack Overflow</strong></a>:</p>
<blockquote><p>I have a table with data such as below:</p>
<table class="excel">
<tr>
<th>Group</th>
<th>Start Date</th>
<th>End Date</th>
</tr>
<tr>
<td>A</td>
<td>2001.01.01</td>
<td>2001.01.03</td>
</tr>
<tr>
<td>A</td>
<td>2001.01.01</td>
<td>2001.01.02</td>
</tr>
<tr>
<td>A</td>
<td>2001.01.03</td>
<td>2001.01.03</td>
</tr>
<tr>
<td>B</td>
<td>2001.01.01</td>
<td>2001.01.01</td>
</tr>
</table>
<p>I am looking to produce a view that gives a count for each day</p></blockquote>
<p>We have a list of ranges here, and for each date we should count the number of ranges that contain this date.</p>
<p>To make this query, we will employ one simple fact: the number of ranges containing a given date is the number of ranges started on or before this date minus the number of ranges that ended before this date.</p>
<p>This can easily be calculated using window functions.</p>
<p>Let&#8217;s create a sample table:</p>
<p><span id="more-2947"></span><br />
<a href="#" onclick="xcollapse('X2263');return false;"><strong>Table creation details</strong></a><br />
</p>
<div id="X2263" style="display: none; ">
<pre class="brush: sql">
BEGIN
        DBMS_RANDOM.seed(20090909);
END;
/

CREATE TABLE t_range (
        id NOT NULL PRIMARY KEY,
        grouper NOT NULL,
        start_date NOT NULL,
        end_date NOT NULL
)
AS
SELECT  level, MOD((level - 1), 10) + 1,
        TO_DATE(&#039;09.09.2009&#039;, &#039;dd.mm.yyyy&#039;) - TRUNC((level - 1) / 14400),
        TO_DATE(&#039;09.09.2009&#039;, &#039;dd.mm.yyyy&#039;) - TRUNC((level - 1) / 14400) +
        TRUNC(DBMS_RANDOM.value * 10) + 1
FROM    dual
CONNECT BY
        level &lt;= 1000000
/
CREATE INDEX ix_range_grouper_startdate ON t_range (grouper, start_date)
/
CREATE INDEX ix_range_grouper_enddate ON t_range (grouper, end_date)
/
</pre>
</div>
<p>This table contains <strong>1,000,000</strong> rows in <strong>10</strong> groups.</p>
<p>Ranges start every <strong>10</strong> minutes (<strong>144</strong> per day) and last from <strong>1</strong> to <strong>11</strong> days. Range lengths are random.</p>
<p>To generate the count of ranges in each group that contain each date, we need to do the following:</p>
<ol>
<li>Generate a list of dates, from minimal to maximal. This can be easily done using a <code>CONNECT BY</code> query on <code>dual</code> table:
<pre class="brush: sql">
SELECT  cur_date
FROM    (
        SELECT  (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + level - 1 AS cur_date
        FROM    dual
        CONNECT BY
                level &lt;=
                (
                SELECT  MAX(end_date)
                FROM    t_range
                ) -
                (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + 1
        ) dates
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>CUR_DATE</th>
</tr>
<tr>
<td class="date">02.07.2009 00:00:00</td>
</tr>
<tr>
<td class="date">03.07.2009 00:00:00</td>
</tr>
<tr>
<td class="date">04.07.2009 00:00:00</td>
</tr>
<tr class="break">
<td colspan="100"></td>
</tr>
<tr>
<td class="date">18.09.2009 00:00:00</td>
</tr>
<tr>
<td class="date">19.09.2009 00:00:00</td>
</tr>
<tr class="statusbar">
<td colspan="100">80 rows fetched in 0.0017s (0.0011s)</td>
</tr>
</table>
</div>
<p>Since <code>MIN</code> and <code>MAX</code> are instant on indexed fields, this query is instant too.</p>
</li>
<li>
<p>Build a rowset that would contain a number of ranges starting on each given date. We can use a simple <code>GROUP BY</code> to do this, since missing dates will be handled by a <code>LEFT JOIN</code> with the list of generated dates that we built on the previous step:</p>
<pre class="brush: sql">
SELECT  grouper AS sgrp, start_date, COUNT(*) AS scnt
FROM    t_range
GROUP BY
        grouper, start_date
ORDER BY
        grouper, start_date
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>SGRP</th>
<th>START_DATE</th>
<th>SCNT</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">02.07.2009 00:00:00</td>
<td class="double_precision">640</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">03.07.2009 00:00:00</td>
<td class="double_precision">1440</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">04.07.2009 00:00:00</td>
<td class="double_precision">1440</td>
</tr>
<tr class="break">
<td colspan="100"></td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="date">08.09.2009 00:00:00</td>
<td class="double_precision">1440</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="date">09.09.2009 00:00:00</td>
<td class="double_precision">1440</td>
</tr>
<tr class="statusbar">
<td colspan="100">700 rows fetched in 0.0247s (0.3159s)</td>
</tr>
</table>
</div>
<p>We see that within group <strong>1</strong>, <strong>640</strong> ranges started on <strong>02.07.2009</strong>, <strong>1,400</strong> ranges started on <strong>03.07.2009</strong> etc.</p>
</li>
<li>
<p>Do the same with <code>end_dates</code>:</p>
<pre class="brush: sql">
SELECT  grouper AS egrp, end_date, COUNT(*) AS ecnt
FROM    t_range
GROUP BY
        grouper, end_date
ORDER BY
        grouper, end_date
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>EGRP</th>
<th>END_DATE</th>
<th>ECNT</th>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">03.07.2009 00:00:00</td>
<td class="double_precision">56</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">04.07.2009 00:00:00</td>
<td class="double_precision">197</td>
</tr>
<tr>
<td class="double_precision">1</td>
<td class="date">05.07.2009 00:00:00</td>
<td class="double_precision">359</td>
</tr>
<tr class="break">
<td colspan="100"></td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="date">18.09.2009 00:00:00</td>
<td class="double_precision">287</td>
</tr>
<tr>
<td class="double_precision">10</td>
<td class="date">19.09.2009 00:00:00</td>
<td class="double_precision">139</td>
</tr>
<tr class="statusbar">
<td colspan="100">790 rows fetched in 0.0278s (0.6250s)</td>
</tr>
</table>
</div>
<p>Within group <strong>1</strong>, <strong>56</strong> ranges ended on <strong>03.07.2009</strong>, <strong>197</strong> ones ended on <strong>04.07.2009</strong> etc.</p>
</li>
<li>
<p>Finally, we should calculate the number of ranges within each group containing each date:</p>
<ol>
<li>
<p>Before and on Jul 2nd, <strong>640</strong> ranges started. <strong>0</strong> ranges ended before that date. This means that this day is contained by <strong>640</strong> ranges.</p>
</li>
<li>
<p>Before and on Jul 3rd, <strong>2,080</strong> ranges started. This includes <strong>1,400</strong> ranges that started on that date and <strong>640</strong> ranges that started before. No ranges ended before this date too. The date is contained by <strong>2,080</strong> ranges.</p>
</li>
<li>
<p>Before and on Jul 4th, <strong>3,520</strong> ranges started. This includes <strong>1,400</strong> ranges that started on that date and <strong>2,080</strong> ranges that started before that date. <strong>56</strong> ranges ended before that date. This date is therefore contained by <strong>3,462</strong> ranges.</p>
</li>
</ol>
<p>Et cetera. Now we see that to calculate the number of ranges contained by any given date we should <code>OUTER JOIN</code> the resultsets containing the counts, and calculate partial sums of these counts. Difference between these sums will be the result we&#8217;re after.</p>
</li>
</ol>
<p>And here&#8217;s the final query:</p>
<pre class="brush: sql">
SELECT  cur_date,
        grouper,
        SUM(COALESCE(scnt, 0) - COALESCE(ecnt, 0)) OVER (PARTITION BY grouper ORDER BY cur_date) AS ranges
FROM    (
        SELECT  (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + level - 1 AS cur_date
        FROM    dual
        CONNECT BY
                level &lt;=
                (
                SELECT  MAX(end_date)
                FROM    t_range
                ) -
                (
                SELECT  MIN(start_date)
                FROM    t_range
                ) + 1
        ) dates
CROSS JOIN
        (
        SELECT  DISTINCT grouper AS grouper
        FROM    t_range
        ) groups
LEFT JOIN
        (
        SELECT  grouper AS sgrp, start_date, COUNT(*) AS scnt
        FROM    t_range
        GROUP BY
                grouper, start_date
        ) starts
ON      sgrp = grouper
        AND start_date = cur_date
LEFT JOIN
        (
        SELECT  grouper AS egrp, end_date, COUNT(*) AS ecnt
        FROM    t_range
        GROUP BY
                grouper, end_date
        ) ends
ON      egrp = grouper
        AND end_date = cur_date - 1
ORDER BY
        grouper, cur_date
</pre>
<div class="terminal">
<table class="terminal">
<tr>
<th>CUR_DATE</th>
<th>GROUPER</th>
<th>RANGES</th>
</tr>
<tr>
<td class="date">02.07.2009 00:00:00</td>
<td class="double_precision">1</td>
<td class="double_precision">640</td>
</tr>
<tr>
<td class="date">03.07.2009 00:00:00</td>
<td class="double_precision">1</td>
<td class="double_precision">2080</td>
</tr>
<tr>
<td class="date">04.07.2009 00:00:00</td>
<td class="double_precision">1</td>
<td class="double_precision">3464</td>
</tr>
<tr class="break">
<td colspan="100"></td>
</tr>
<tr>
<td class="date">18.09.2009 00:00:00</td>
<td class="double_precision">10</td>
<td class="double_precision">426</td>
</tr>
<tr>
<td class="date">19.09.2009 00:00:00</td>
<td class="double_precision">10</td>
<td class="double_precision">139</td>
</tr>
<tr class="statusbar">
<td colspan="100">800 rows fetched in 0.0296s (1.0937s)</td>
</tr>
</table>
</div>
<p>The results are just like we expected.</p>
<p>Note the <code>JOIN</code> condition <code>end_date = cur_date - 1</code>. We subtract a day to make sure that we select events that ended strictly before the date. The events that ended on the date in question still contain it and should contribute into the count.</p>
<p>This query is very fast (<strong>1,000,000</strong> rows are processed in just a trifle more than a second).</p>
<div class='wb_fb_bottom'><!-- Wordbooker created FB tags --> <iframe src="http://www.facebook.com/plugins/like.php?locale=en_US&href=http://explainextended.com/2009/09/09/oracle-generating-a-list-of-dates-and-counting-ranges-for-each-date/&amp;layout=standard&amp;show_faces=false&amp;width=250&amp;action=like&amp;colorscheme=light&amp;font=arial&amp;height=35px" scrolling="no" frameborder="no" style="border:none; overflow:hidden; width:250px; height:35px;" allowTransparency="true"></iframe><div style="float:right;"><!-- Wordbooker created FB tags --> <a name="fb_share" type="button" share_url="http://explainextended.com/2009/09/09/oracle-generating-a-list-of-dates-and-counting-ranges-for-each-date/"></a></div></div>]]></content:encoded>
			<wfw:commentRss>http://explainextended.com/2009/09/09/oracle-generating-a-list-of-dates-and-counting-ranges-for-each-date/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

