Comments on: Things SQL needs: determining range cardinality

By: Ula Close

Ula Close — Mon, 05 Feb 2018 19:56:04 +0000

As a math major you is not going to only be taught a whole lot of mathematics,
but you’ll strengthen your drawback-solving skills, sharpen your crucial considering abilities,
and be higher prepared for all times after college https://math-problem-solver.com/ .

I’d say that this could individuals of varying talents.

By: Ian

Ian — Sun, 02 Jan 2011 08:48:37 +0000

This article seems to be attempting a similar goal, but in a different way. http://wiki.postgresql.org/wiki/Loose_indexscan

By: Quassnoi

Quassnoi — Sun, 14 Nov 2010 17:34:21 +0000

I’m missing why it would need to do any record look-ups when all the data is contained in 2 indexes Each index, as you suggested them, contains data on the respective field only (one on quantity, another one on urgency). There is no information about urgency in the quantity index and vice versa. You should look that up in the table. Of course you could make both indexes covering, but this would still require you to scan 5k records (instead of but 25).

By: Ian

Ian — Sun, 14 Nov 2010 16:45:33 +0000

“That’s 5000 records (from 1 to 5000)”
“That’s another 5000 records”

I’m missing why it would need to do any record look-ups when all the data is contained in 2 indexes. Table lookups would only happen once the RIDs had been calculated via index lookup ops.

By: Quassnoi

Quassnoi — Sun, 14 Nov 2010 16:13:43 +0000

So if we query “<= 4", then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy "<= 4" That's 5000 records (from 1 to 5000) Then it would either do a table scan **only on the relevant records** to filter urgency, Urgency is not a part of the index on quantity. We'll have to do a table lookup for each of 5000 records, each of them being about five times as costly. Better, it would do a walk-through on the urgency index, find all records for "<= 4", and then perform an index intersect on the two index result sets, giving us the RIDs we're looking for That's another 5000 records, plus the overhead of a join between the resultsets — opposed to 25 (twenty five) index seeks in a composite index with an IN rewrite. I don't see why it needs to scan 40k records. If we substitute 5 with 20 (as in your original comment), we'll get 40k vs 400. If we leave 5 as is, we'll get 10k vs 25.

By: Ian

Ian — Sun, 14 Nov 2010 15:18:54 +0000

@Quassnoi thanks for your reply. I mean IN() on int types. Ok, something here seems illogical to me (not from you). Taking your example, let’s look at the quantity column. If it is indexed, then we know it’s sorted. So if we query “<= 4", then surely the query engine will walk only the top of the index until it finds 5, at which point it will quit, having found all the records that satisfy "<= 4". Then it would either do a table scan **only on the relevant records** to filter urgency, OR, better, it would do a walk-through on the urgency index, find all records for "<= 4", and then perform an index intersect on the two index result sets, giving us the RIDs we're looking for. To me, this is logical. I don't see why it needs to scan 40k records.

By: Quassnoi

Quassnoi — Sun, 14 Nov 2010 13:28:44 +0000

@Ian: I thought that < 20 would have done exactly what IN () is doing IN() on what column type? If the column type is, say, FLOAT, the smallest superset of the values that satisfy the condition is close to infinity (I know this is wrong in math sense, but you know what I mean). It's infeasible to convert the range into the set. If the type is UNSIGNED INT then of course it can and sometimes needs to be done. No optimizer known to me is able to do it, if that was your question, that's why I wrote this article. On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here? The optimizers are capable of doing INDEX_INTERSECT, that is selecting the recordsets satisfying each condition separately then joining these recordsets. This would require scanning 20000 + 20000 = 40000 records plus overhead of joining them, as opposed to just 400 records with a range rewrite.

By: Ian

Ian — Sat, 13 Nov 2010 22:14:06 +0000

@Quassnoi thank you for a deeply thought-provoking article. I personally find it shocking that a RDBMS cannot efficiently process ranges, as you have proven. I thought that < 20 would have done exactly what IN () is doing. To me, it seems obvious that is should. Does SQL Server suffer this fate too?

On a related point, if those columns were individually indexed, as opposed to indexed as you have done with low cardinality, would we still be looking at the scenario you outline here?

By: Rakesh

Rakesh — Fri, 02 Jul 2010 07:37:06 +0000

@author/admin:-
If you guys have a plugin which will give you the list of related articles in end of each blog sections will be great.

By: Rakesh

Rakesh — Fri, 02 Jul 2010 07:35:57 +0000

Thanks for letting us understand how indexing works, I did understood well now.

Thanks for the good work.

Just curious to know how does these number works with “joins” (many-tables).