<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Things SQL needs: determining range cardinality</title>
	<atom:link href="http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/feed/" rel="self" type="application/rss+xml" />
	<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/</link>
	<description>How to create fast database queries</description>
	<lastBuildDate>Sun, 15 Apr 2012 12:14:02 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Ian</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-513</link>
		<dc:creator>Ian</dc:creator>
		<pubDate>Sun, 02 Jan 2011 08:48:37 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-513</guid>
		<description>This article seems to be attempting a similar goal, but in a different way. http://wiki.postgresql.org/wiki/Loose_indexscan</description>
		<content:encoded><![CDATA[<p>This article seems to be attempting a similar goal, but in a different way. <a href="http://wiki.postgresql.org/wiki/Loose_indexscan" rel="nofollow">http://wiki.postgresql.org/wiki/Loose_indexscan</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quassnoi</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-437</link>
		<dc:creator>Quassnoi</dc:creator>
		<pubDate>Sun, 14 Nov 2010 17:34:21 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-437</guid>
		<description>&lt;em&gt;I’m missing why it would need to do any record look-ups when all the data is contained in 2 indexes&lt;/em&gt;

Each index, as you suggested them, contains data on the respective field only (one on &lt;code&gt;quantity&lt;/code&gt;, another one on &lt;code&gt;urgency&lt;/code&gt;). There is no information about &lt;code&gt;urgency&lt;/code&gt; in the &lt;code&gt;quantity&lt;/code&gt; index and vice versa. You should look that up in the table.

Of course you could make both indexes covering, but this would still require you to scan 5k records (instead of but 25).</description>
		<content:encoded><![CDATA[<p><em>I’m missing why it would need to do any record look-ups when all the data is contained in 2 indexes</em></p>
<p>Each index, as you suggested them, contains data on the respective field only (one on <code>quantity</code>, another one on <code>urgency</code>). There is no information about <code>urgency</code> in the <code>quantity</code> index and vice versa. You should look that up in the table.</p>
<p>Of course you could make both indexes covering, but this would still require you to scan 5k records (instead of but 25).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-436</link>
		<dc:creator>Ian</dc:creator>
		<pubDate>Sun, 14 Nov 2010 16:45:33 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-436</guid>
		<description>&quot;That’s 5000 records (from 1 to 5000)&quot;
&quot;That’s another 5000 records&quot;

I&#039;m missing why it would need to do any record look-ups when all the data is contained in 2 indexes. Table lookups would only happen once the RIDs had been calculated via index lookup ops.</description>
		<content:encoded><![CDATA[<p>&#8220;That’s 5000 records (from 1 to 5000)&#8221;<br />
&#8220;That’s another 5000 records&#8221;</p>
<p>I&#8217;m missing why it would need to do any record look-ups when all the data is contained in 2 indexes. Table lookups would only happen once the RIDs had been calculated via index lookup ops.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quassnoi</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-435</link>
		<dc:creator>Quassnoi</dc:creator>
		<pubDate>Sun, 14 Nov 2010 16:13:43 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-435</guid>
		<description>&lt;em&gt;So if we query “&lt;= 4&quot;, then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy &quot;&lt;= 4&quot;&lt;/em&gt;

That&#039;s 5000 records (from 1 to 5000)

&lt;em&gt;Then it would either do a table scan **only on the relevant records** to filter urgency,&lt;/em&gt;

&lt;code&gt;Urgency&lt;/code&gt; is not a part of the index on &lt;code&gt;quantity&lt;/code&gt;. We&#039;ll have to do a table lookup for each of 5000 records, each of them being about five times as costly.

&lt;em&gt;Better, it would do a walk-through on the urgency index, find all records for &quot;&lt;= 4&quot;, and then perform an index intersect on the two index result sets, giving us the RIDs we&#039;re looking for&lt;/em&gt;

That&#039;s another 5000 records, plus the overhead of a join between the resultsets — opposed to 25 (twenty five) index seeks in a composite index with an &lt;code&gt;IN&lt;/code&gt; rewrite.

&lt;em&gt;I don&#039;t see why it needs to scan 40k records.&lt;/em&gt;

If we substitute 5 with 20 (as in your original comment), we&#039;ll get 40k vs 400. If we leave 5 as is, we&#039;ll get 10k vs 25.</description>
		<content:encoded><![CDATA[<p><em>So if we query “&lt;= 4&#8243;, then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy &#8220;&lt;= 4&#8243;</em></p>
<p>That&#8217;s 5000 records (from 1 to 5000)</p>
<p><em>Then it would either do a table scan **only on the relevant records** to filter urgency,</em></p>
<p><code>Urgency</code> is not a part of the index on <code>quantity</code>. We&#8217;ll have to do a table lookup for each of 5000 records, each of them being about five times as costly.</p>
<p><em>Better, it would do a walk-through on the urgency index, find all records for &#8220;&lt;= 4&#8243;, and then perform an index intersect on the two index result sets, giving us the RIDs we&#8217;re looking for</em></p>
<p>That&#8217;s another 5000 records, plus the overhead of a join between the resultsets — opposed to 25 (twenty five) index seeks in a composite index with an <code>IN</code> rewrite.</p>
<p><em>I don&#8217;t see why it needs to scan 40k records.</em></p>
<p>If we substitute 5 with 20 (as in your original comment), we&#8217;ll get 40k vs 400. If we leave 5 as is, we&#8217;ll get 10k vs 25.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-434</link>
		<dc:creator>Ian</dc:creator>
		<pubDate>Sun, 14 Nov 2010 15:18:54 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-434</guid>
		<description>@Quassnoi thanks for your reply. I mean IN() on int types. Ok, something here seems illogical to me (not from you). Taking your example, let&#039;s look at the quantity column. If it is indexed, then we know it&#039;s sorted. So if we query &quot;&lt;= 4&quot;, then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy &quot;&lt;= 4&quot;. Then it would either do a table scan **only on the relevant records** to filter urgency, OR, better, it would do a walk-through on the urgency index, find all records for &quot;&lt;= 4&quot;, and then perform an index intersect on the two index result sets, giving us the RIDs we&#039;re looking for. To me, this is logical. I don&#039;t see why it needs to scan 40k records.</description>
		<content:encoded><![CDATA[<p>@Quassnoi thanks for your reply. I mean IN() on int types. Ok, something here seems illogical to me (not from you). Taking your example, let&#8217;s look at the quantity column. If it is indexed, then we know it&#8217;s sorted. So if we query &#8220;&lt;= 4&quot;, then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy &quot;&lt;= 4&quot;. Then it would either do a table scan **only on the relevant records** to filter urgency, OR, better, it would do a walk-through on the urgency index, find all records for &quot;&lt;= 4&quot;, and then perform an index intersect on the two index result sets, giving us the RIDs we&#039;re looking for. To me, this is logical. I don&#039;t see why it needs to scan 40k records.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quassnoi</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-433</link>
		<dc:creator>Quassnoi</dc:creator>
		<pubDate>Sun, 14 Nov 2010 13:28:44 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-433</guid>
		<description>@Ian:

&lt;em&gt;I thought that &lt; 20 would have done exactly what IN () is doing&lt;/em&gt;

&lt;code&gt;IN()&lt;/code&gt; on what column type?

If the column type is, say, &lt;code&gt;FLOAT&lt;/code&gt;, the smallest superset of the values that satisfy the condition is close to infinity (I know this is wrong in math sense, but you know what I mean). It&#039;s infeasible to convert the range into the set.

If the type is &lt;code&gt;UNSIGNED INT&lt;/code&gt; then of course it can and sometimes needs to be done.

No optimizer known to me is able to do it, if that was your question, that&#039;s why I wrote this article.

&lt;em&gt;On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here?&lt;/em&gt;

The optimizers are capable of doing &lt;code&gt;INDEX_INTERSECT&lt;/code&gt;, that is selecting the recordsets satisfying each condition separately then joining these recordsets. This would require scanning 20000 + 20000 = 40000 records plus overhead of joining them, as opposed to just 400 records with a range rewrite.</description>
		<content:encoded><![CDATA[<p>@Ian:</p>
<p><em>I thought that &lt; 20 would have done exactly what IN () is doing</em></p>
<p><code>IN()</code> on what column type?</p>
<p>If the column type is, say, <code>FLOAT</code>, the smallest superset of the values that satisfy the condition is close to infinity (I know this is wrong in math sense, but you know what I mean). It&#8217;s infeasible to convert the range into the set.</p>
<p>If the type is <code>UNSIGNED INT</code> then of course it can and sometimes needs to be done.</p>
<p>No optimizer known to me is able to do it, if that was your question, that&#8217;s why I wrote this article.</p>
<p><em>On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here?</em></p>
<p>The optimizers are capable of doing <code>INDEX_INTERSECT</code>, that is selecting the recordsets satisfying each condition separately then joining these recordsets. This would require scanning 20000 + 20000 = 40000 records plus overhead of joining them, as opposed to just 400 records with a range rewrite.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-432</link>
		<dc:creator>Ian</dc:creator>
		<pubDate>Sat, 13 Nov 2010 22:14:06 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-432</guid>
		<description>@Quassnoi thank you for a deeply thought-provoking article. I personally find it shocking that a RDBMS cannot efficiently process ranges, as you have proven. I thought that &lt; 20 would have done exactly what IN () is doing. To me, it seems obvious that is should. Does SQL Server suffer this fate too?

On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here?</description>
		<content:encoded><![CDATA[<p>@Quassnoi thank you for a deeply thought-provoking article. I personally find it shocking that a RDBMS cannot efficiently process ranges, as you have proven. I thought that &lt; 20 would have done exactly what IN () is doing. To me, it seems obvious that is should. Does SQL Server suffer this fate too?</p>
<p>On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rakesh</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-149</link>
		<dc:creator>Rakesh</dc:creator>
		<pubDate>Fri, 02 Jul 2010 07:37:06 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-149</guid>
		<description>@author/admin:-
If you guys have a plugin which will give you the list of related articles in end of each blog sections will be great.</description>
		<content:encoded><![CDATA[<p>@author/admin:-<br />
If you guys have a plugin which will give you the list of related articles in end of each blog sections will be great.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rakesh</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-148</link>
		<dc:creator>Rakesh</dc:creator>
		<pubDate>Fri, 02 Jul 2010 07:35:57 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-148</guid>
		<description>Thanks for letting us understand how indexing works, I did understood well now.

Thanks for the good work.

Just curious to know how does these number works with &quot;joins&quot; (many-tables).</description>
		<content:encoded><![CDATA[<p>Thanks for letting us understand how indexing works, I did understood well now.</p>
<p>Thanks for the good work.</p>
<p>Just curious to know how does these number works with &#8220;joins&#8221; (many-tables).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quassnoi</title>
		<link>http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/comment-page-1/#comment-142</link>
		<dc:creator>Quassnoi</dc:creator>
		<pubDate>Tue, 22 Jun 2010 07:15:17 +0000</pubDate>
		<guid isPermaLink="false">http://explainextended.com/?p=4624#comment-142</guid>
		<description>@Frode: thanks!</description>
		<content:encoded><![CDATA[<p>@Frode: thanks!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

