EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Things SQL needs: determining range cardinality. You're welcome to read and comment on it.

Archive for the ‘Miscellaneous’ Category

Things SQL needs: determining range cardinality

with 7 comments

What is the problem with this query?

SELECT  *
FROM    orders
WHERE   quantity < = 4
        AND urgency <= 4


The problem is indexing strategy, of course. Which columns should we index?

If we index quantity, the optimizer will be able to use the index to filter on it. However, filtering on urgency will require scanning all records with quantity < 4 and applying the urgency filter to each record found.

Same with urgency. We can use range access on urgency using an index, but this will require filtering on quantity.

Why, create a composite index!, some will say.

Unfortunately, that won’t help much.

A composite B-Tree index maintains what is called a lexicographical order of the records. This means that an index on (quantity, urgency) will sort on quantity, and only if the quantities are equal, it will take the urgency into account.

The picture below shows how would the records be ordered in such an index:

As we can see, with a single index range scan (i. e. just following the arrows) we cannot select only the records within the dashed rectangle. There is no single index range that could be used to filter on both columns.

Even if we changed the field order in the index, it would just change the direction of the arrows connecting the records:
Read the rest of this entry »

Written by Quassnoi

May 19th, 2010 at 11:00 pm

Posted in Miscellaneous

Things SQL needs: MERGE JOIN that would seek

with 3 comments

One of the most known and least used join algorithms in SQL engines is MERGE JOIN.

This algorithm operates on two sorted recordsets, keeping two pointers that chase each other.

The Wikipedia entry above describes it quite well in terms of algorithms. I’ll just make an animated GIF to make it more clear:

This is quite a nice and elegant algorithm, which, unfortunately, has two major drawbacks:

  1. It needs the recordsets to be sorted
  2. Even with the recordsets sorted, it is no better than a HASH JOIN

The sorting part is essential for this algorithm and there is nothing that can be done with it: the recordsets should be sorted, period. Databases, however, often provide the records in the sorted order: from clustered tables, indexes, previously sorted and ordered subqueries, spool tables etc.

But even when the recordsets are already sorted, on equijoins the MERGE JOIN is hardly faster than a HASH JOIN.

Why?
Read the rest of this entry »

Written by Quassnoi

May 7th, 2010 at 11:00 pm

Posted in Miscellaneous

NoSQL

with 5 comments

I had a vision tonight.

Image by nataliej

A huge, dark, grim figure approached me, seized me with its long bony arms and made me see all the vanity of the world we are living in.

Bloated database engines, useless ACID requirements, meaningless joins are now in the past for me.

I decided to move to NoSQL.

Where do I begin?

Written by Quassnoi

April 1st, 2010 at 1:00 pm

Posted in Miscellaneous

Things SQL needs: sargability of monotonic functions

with 5 comments

Image by ndevil

I’m going to write a series of articles about the things SQL needs to work faster and more efficienly.

With these things implemented, most of the tricks I explain in my blog will become redundant and I’ll finally have some more time to spend with the family.

Ever wondered why a condition like this:

WHERE TRUNC(mydate) = TRUNC(SYSDATE)

, which searches for the current day’s records, is so elegant but so slow?

Of course this is because even if you create an index on mydate, this index cannot be used.

The expression in the left part of the equality is not a mydate. The database engine cannot find a way to use an index to search for it. It is said that this expression is not sargable.

Now, a little explanation about the indexes and sargability. If you are familiar with these, you can skip this chapter. But beware that this chapter is the only one illustrated, so skipping it will make the article too boring to read.

Ahem.

To locate a record in a B-Tree index, the keys of the index should be compared to the value being searched for.

Let’s consider this sample B-Tree index:


Read the rest of this entry »

Written by Quassnoi

February 19th, 2010 at 11:00 pm

Posted in Miscellaneous

Happy New Year!

with 4 comments

Oracle New Year tree

WITH    tree AS
        (
        SELECT  /*+ MATERIALIZE */
                '  ,,,"''' AS needles, 'Oo%$&' AS decorations
        FROM    dual
        ),
        branches AS
        (
        SELECT  level AS id, level - TRUNC((level - 2) / 6) * 2 AS s
        FROM    dual
        CONNECT BY
                level <= 24
        )
SELECT  RPAD(' ', 18, ' ') || '*' AS tree
FROM    dual
UNION ALL
SELECT  RPAD(' ', 18 - s, ' ') || '/' ||
        (
        SELECT  REPLACE(
                SYS_CONNECT_BY_PATH
                (
                CASE
                WHEN DBMS_RANDOM.value < 0.1 THEN
                        SUBSTR(decorations, TRUNC(DBMS_RANDOM.value * LENGTH(decorations)) + 1, 1)
                ELSE
                        SUBSTR(needles, TRUNC(DBMS_RANDOM.value * LENGTH(needles)) + 1, 1)
                END , '/'
                ),
                '/', ''
                )
        FROM    tree
        WHERE   CONNECT_BY_ISLEAF = 1
        CONNECT BY
                level <= s * 2 - 1
        )
        || '\'
FROM    branches
TREE
*
/,\
/,, \
/$," ,\
/, ',, \
/ ,'o'' ,\
/,, ,'",,,, \
/,',,$, '' ,,\
/,, ,'",,,, \
/",, ,,"' , ",\
/ $' ,, " ,' '\
/, , '"O,,'o"," "\
/,,,," , ',,',,',',\
/,',,"' ' ' ,,,,&"o,\
/,' ,,," ,',%" ,,,\
/,',,"' ' ' ,,,,&"o,\
/,$,,," ,,$'%",",,%,,''"\
/ ' ,, ," ,,& , ,, ,, \
/,",O' ,',, , ,$, "',",,,',\
/,," ",, ,,, ,',', " ,""',\
/ ,O ',,,,," o , '','', ,&,\
/,," ",, ,,, ,',', " ,""',\
/,"'',,""$"," ,$,%,' "" ,,,',",\
/, '" "O,,o' , ,'% $',,, ,""" , \
/, O' ,,, &,',,' , ',$, ,' ,,, ,,'\

SQL Server bottle of champagne

WITH    lines AS
        (
        SELECT  1 AS n
        UNION ALL
        SELECT  n + 1
        FROM    lines
        WHERE   n < 24
        ),
        bottle_parts (part, lft, body, rgt, w) AS
        (
        SELECT  'mouth', '.', '_', '.', 2
        UNION ALL
        SELECT  'neck', '|', ' ', '|', 2
        UNION ALL
        SELECT  'bell', '/', ' ', '\', NULL
        UNION ALL
        SELECT  'border', '/', '=', '\', NULL
        UNION ALL
        SELECT  'body', '|', ' ', '|', 7
        UNION ALL
        SELECT  'bottom', '(', '_', ')', 7
        ),
        bottle_lines AS
        (
        SELECT  CASE
                WHEN n = 1 THEN 'mouth'
                WHEN n BETWEEN 2 AND 8 THEN 'neck'
                WHEN n = 11 THEN 'border'
                WHEN n BETWEEN 9 AND 13 THEN 'bell'
                WHEN n BETWEEN 13 AND 23 THEN 'body'
                WHEN n = 24 THEN 'bottom'
                END AS part,
                n
        FROM    lines
        ),
        bottle_width AS
        (
        SELECT  bp.*, n, COALESCE(w, (n - 6)) AS width
        FROM    bottle_lines bl
        JOIN
                bottle_parts bp
        ON      bp.part = bl.part
        ),
        bottle AS
        (
        SELECT  n, width, REPLICATE(' ', 18 - width) + lft + REPLICATE(body, width * 2) + rgt AS line
        FROM    bottle_width
        ),
        bubbles (pos, m, bubble) AS
        (
        SELECT  CAST(ROUND(RAND() * 4, 0, 1) AS INTEGER), 1, ':' AS bubble
        UNION ALL
        SELECT  pos + ASCII(SUBSTRING(CAST(NEWID() AS VARCHAR(MAX)), 1, 1)) % 3 - 1, m + 1, SUBSTRING(':.oO', m / 3 + 1, 1)
        FROM    bubbles
        WHERE   m < 12
        ),
        limits AS
        (
        SELECT  *,
                18 +
                CASE
                WHEN pos < 1 - width THEN 1 - width
                WHEN pos > width - 1 THEN width - 1
                ELSE pos
                END AS rpos
        FROM    bottle bo
        LEFT HASH JOIN
                bubbles bu
        ON      bu.m = 24 - bo.n
        )
SELECT  SUBSTRING(line, 1, COALESCE(rpos, 1) - 1) + COALESCE(bubble, ' ') + SUBSTRING(line, COALESCE(rpos, 1) + 1, 200) AS bottle
FROM    limits
ORDER BY
        n
bottle
.____.
| |
| |
| |
| |
| |
| |
| |
/ \
/ \
/==========\
/ O \
/ O \
| O |
| o |
| o |
| o |
| . |
| . |
| . |
| : |
| : |
| : |
(______________)

PostgreSQL fireworks

WITH    centers AS
        (
        SELECT  angle,
                len,
                ROUND(len * SIN(2 * PI() * angle)) AS x,
                ROUND(len * COS(2 * PI() * angle)) AS y,
                ROUND(len * 0.3)::INTEGER + 1 AS trace
        FROM    (
                SELECT  RANDOM() AS angle,
                        8 * (1 - POWER(RANDOM(), 3)) AS len
                FROM    generate_series (1, 50) s
                ) q
        ),
        traces AS
        (
        SELECT  *,
                generate_series(1, trace) AS part
        FROM    centers
        ),
        parts AS
        (
        SELECT  CASE
                WHEN trace = part THEN
                        LEAST(len * 0.2, 2)::INTEGER
                ELSE
                        TRUNC(angle * 8 - 0.5)::INTEGER % 4 + 3
                END AS symbol,
                TRUNC(x + part * SIN(2 * PI() * angle)) AS x,
                TRUNC(y + part * COS(2 * PI() * angle)) AS y
        FROM    traces
        )
        SELECT  ARRAY_TO_STRING(
        ARRAY(
        SELECT  COALESCE(
                (
                SELECT  SUBSTR(E'.xX\\-/|', MIN(symbol) + 1, 1)
                FROM    parts
                WHERE   x = col - 14
                        AND y = row - 12
                ), ' ')
        FROM    generate_series(1, 25) col
        ), ''
        ) AS FIREWORKS
FROM    generate_series(1, 24) row
fireworks
X
| X
x| | X
x | | /
X\ \x | /x
X\ \ \ /
X \ \ x / /xx
\\ x |. ///
X– \ |
xxx– .
Xx– ..
xxx-/ –x
. \\ –X
. .
/ \
X/ | \ \x
/ |\ \\
// |x\ \ \x
x/ |\x \ X
X x\x X
X

MySQL postcard

SELECT  CONCAT(COALESCE(border, '|'), RPAD(COALESCE(postcard, ''), 61, COALESCE(filler, ' ')), COALESCE(border, '|')) AS postcard
FROM    (
        SELECT  id2 * 5 + id1 AS line
        FROM    (
                SELECT 1 AS id1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
                ) q1
        CROSS JOIN
                (
                SELECT 0 AS id2 UNION ALL SELECT 1
                ) q2
        ) dummy
LEFT JOIN
        (
        SELECT  3 AS line, CONCAT(' Dear ', USER(), ',') AS postcard
        UNION ALL
        SELECT  4, ' I wish you good luck in the New Year.'
        UNION ALL
        SELECT  5, ' May all your plans be optimal, all your queries be answered'
        UNION ALL
        SELECT  6, ' and all your estimations be correct.'
        UNION ALL
        SELECT  8, '               Yours sincerely,'
        UNION ALL
        SELECT  9, '                                                    Quassnoi'
        ) ln
ON      ln.line = dummy.line
LEFT JOIN
        (
        SELECT 1 AS line, '.' AS border, '_' AS filler
        UNION ALL
        SELECT 10 AS line, '.' AS border, '_' AS filler
        ) borders
ON      borders.line = dummy.line
ORDER BY
        dummy.line
postcard
._____________________________________________________________.
| |
| Dear dbuser@localhost, |
| I wish you good luck in the New Year. |
| May all your plans be optimal, all your queries be answered |
| and all your estimations be correct. |
| |
| Yours sincerely, |
| Quassnoi |
._____________________________________________________________.
Happy New Year!

Written by Quassnoi

December 31st, 2009 at 11:00 pm

Posted in Miscellaneous

What is entity-relationship model?

with 9 comments

Image by Samuel Mann

Image by Samuel Mann

A relational database, as we all know from the previous article, stores relations between integers, strings and other simple types in a very plain way: it just enumerates all values related.

This model is extremely flexible, since other relations can be easily derived from existing ones using mathematical formulae, and the relational database takes care of that.

However, database should reflect the real world in some way to be really of use. Since the relational databases store relations between mathematical abstractions and not real world things, we should make some kind of a mapping of ones to the others.

This is what entity-relationship model is for.

An entity-relationship model, as one can easily guess from its name, models relationships between entities.

But since we know that databases do essentially the same, how does it differ from the database model?

  • An entity-relationship model states which data and relations between them should be stored
  • A database model states how these relations are stored

In other words, ER model is design and database model is one of the ways to implement it. ER model is said to be above the database model in the waterfall developement.

Note that in both cases I use the word stored above. The model says nothing of data and relations between them that may or should exist, only of those that should be stored. Every human being participates in thousands if not millions relationships, but an ER model should state which of them are to be stored, and a relational model should decide how to store them. Of course it would be nice to store them all (and police and insurance would be just happy), but the technology does not allow it yet. Don’t know whether it is good or bad.

A typical entity-relationship diagram in Peter Chen’s notation looks like this:

An entity-relationship diagram

What does it mean?
Read the rest of this entry »

Written by Quassnoi

October 18th, 2009 at 11:00 pm

Posted in Miscellaneous

What is a relational database?

with 2 comments

My blog is about SQL, and SQL is about relational databases.

So, what’s SQL and what’s a relational database and what’s an entity-relationship model?

These terms are often misused and this leads to confusion.

In this article, I’d like to explain what a relational database is.

For starters, a little quote from Jack London’s The League of the Old Men:

An Indian healer

An Indian healer

Howkan began to read. Imber listened for a space, when a wonderment rose up in his face and he broke in abruptly.

That be my talk, Howkan. Yet from thy lips it comes when thy ears have not heard.

Howkan smirked with self-appreciation. His hair was parted in the middle. Nay, from the paper it comes, O Imber. Never have my ears heard. From the paper it comes, through my eyes, into my head, and out of my mouth to thee. Thus it comes.

Thus it comes? It be there in the paper? Imber’s voice sank in whisperful awe as he crackled the sheets ‘twixt thumb and finger and stared at the charactery scrawled thereon. It be a great medicine, Howkan, and thou art a worker of wonders.

It be nothing, it be nothing, the young man responded carelessly and pridefully. He read at hazard from the document: In that year, before the break of the ice, came an old man, and a boy who was lame of one foot. These also did I kill, and the old man made much noise—

It be true, Imber interrupted breathlessly. He made much noise and would not die for a long time. But how dost thou know, Howkan? The chief man of the white men told thee, mayhap? No one beheld me, and him alone have I told.

Howkan shook his head with impatience. Have I not told thee it be there in the paper, O fool?

Imber stared hard at the ink-scrawled surface. As the hunter looks upon the snow and says, Here but yesterday there passed a rabbit; and here by the willow scrub it stood and listened, and heard, and was afraid; and here it turned upon its trail; and here it went with great swiftness, leaping wide; and here, with greater swiftness and wider leapings, came a lynx; and here, where the claws cut deep into the snow, the lynx made a very great leap; and here it struck, with the rabbit under and rolling belly up; and here leads off the trail of the lynx alone, and there is no more rabbit,—as the hunter looks upon the markings of the snow and says thus and so and here, dost thou, too, look upon the paper and say thus and so and here be the things old Imber hath done?

Old Imber made a great analogy. Information is but markings on the snow. When you look at the snow, there are no rabbit or lynx there. But when an experienced hunter sees their traces he can go and find them.

However, the old Indian could only read the information while his nephew could also create it by writing the words on paper and reading them back. He had full control over the information he created.

And for information to be useful, it should be structured.

When a car dealer describes a car he really wants to sell he can spend half an hour talking of magnificent design, good extras and a decent price, but just forget to mention that, say, the warranty terms are not those you’d expect.

But when you fill a credit card application form, you answer a set of predefined questions. What is your name? Where do you live? How much do you earn? Do you have incurable cancer?

The bank needs all this information to make sure you will pay your debt (or at least they can get a decent price for your debt when it’s sold to a collection agency).

And they want you to hide no important information about yourself and to represent all information in such a way that it can be easily interpreted and make sure nothing they are interested in would be forgotten to answer or left behind.

They want the data to be structured. And by making you to fill a form they enforce the structure of the information you provide.

They force you to use their data model.

Read the rest of this entry »

Written by Quassnoi

August 23rd, 2009 at 11:00 pm

Posted in Miscellaneous

Presentation layer

Comments enabled. I *really* need your comment

A question frequently asked by many web application developers:

Where do I build a presentation layer for my data?

Do I do it on SQL side or on PHP / ASP /whatever side?

This is a very interesting question.

A famous Chinese proverb (which like the majority of famous Chinese proverbs is invented by a Western journalist) says A picture is worth a thousand words.

A picture is worth a thousand words

A picture is worth a thousand words.
Image by zedzap

How do you think, why it is? What’s so good in a picture?

Seems to be a strange question. A picture just tells more. If you look at a picture, you get an overall impression, which can be hardly expressed using words. Words just don’t pull the right strings in your mind, those that a picture does. And the words are harder to read, they are more numerous. And a dozen of other reasons.

However, the main reason seems to be this: you cannot give a complete description of a picture using words.

But can it be so that a word is worth a thousand pictures?

Sure it can. And any book fan who had seen the screen version of his favorite work can confirm it.

From Douglas Adams’s masterpiece The Hitchiker’s Guide to the Galaxy:

A hole had just appeared in the Galaxy. It was exactly a nothingth of a second long, a nothingth of an inch wide, and quite a lot of million light years from end to end.

Or this one:

Five wild Event Maelstroms swirled in vicious storms of unreason and spewed up a pavement. On the pavement lay Ford Prefect and Arthur Dent gulping like half-spent fish.

Adams knew how to arrange his words so that they draw a perfect, vivid picture in the reader’s mind.

The problem is that everyone gets his very own picture.

What do you imagine when reading of a storm of unreason? Can you show a nothingth of an inch wide on your fingers? Can you sketch some Vogon poetry to show how really bad it is?

Read the rest of this entry »

Written by Quassnoi

August 16th, 2009 at 11:00 pm

Posted in Miscellaneous

Double-thinking in SQL

with 10 comments

One of the first things a novice SQL developer learns about is called thinking in SQL, which is usually being opposed to procedural thinking

Let’s see what part of brain does this intellectual activity take and how to use it.

Two features distinguish SQL from other languages you learned as a 11-year old kid on your first PC, like BASIC or perl or maybe even C++ if you’re such a Wunderkind.

First, SQL is set-based. It does things with sets.

Every tool is designed to do things with something else. Like, you use a hammer to do things with nails, or use a screwdriver to do things with screws, or use an oven to do things with food.

Same with computer languages.

BASIC does things with variables. perl does things with scalars, arrays, hashes and file streams. Assembly does things with registers and memory.

You should not be confused by something like registers are just a special case of variables, or a hash is just a generalized container which exposes this and this method or something like that. No.

A hash is a hash, a variable is a variable and a register is a register.

Like, an egg is a food and rice is a food and it’s possible to cook some eggs in a rice cooker and vice versa, but they are just wrong tools to do that.

Akihabara district in Tokyo City. You can buy any type of gadget you can imagine, which is about 2% of what you will find there.

Akihabara district in Tokyo City. You can buy any type of gadget you can imagine, which is about 2% of what you will find there.

Prehistoric men had to make do with hammerstones and lithic blades (even to court their women), but now we have a whole district in Tokyo City for gadgets with USB type A, and another district for gadgets with USB type B.

So if you feel the urge to hash something and then make a good old array out of this, you don’t use assembly, but perl or PHP instead.

Same with SQL. SQL does things with sets.

It’s a tool that allows you to take a dozen or two of sets, mix them together, knead and wedge them then chop them apart and mix again, but the output you get is still a set and all inputs are sets.

Eeverything you do in SQL, you do it on sets. That’s why SQL is called a set-oriented language.

Ok, that was the first feature that distinguishes SQL from other languages. What’s the second one?

SQL is a declarative language. This means that you express what you want to do with sets, not how you want to do it.

This requires a little explanation.
Read the rest of this entry »

Written by Quassnoi

July 12th, 2009 at 11:00 pm

Posted in Miscellaneous

Schema junk

with one comment

Many tasks in SQL world can be efficiently performed using stored procedures.

All database management systems of the Big Four (that is Oracle, SQL Server, MySQL and PostgreSQL) have a concept of a stored procedure: a little server-managed program written in a procedural language that allows running SQL statements and returning results.

They are widely recommended for use, since they allow easy separation of business rules from data storage rules, fine-grained security, integrity control and all other things software analysts mumble for their living.

People widely accept this theory and if you ask anyone should I use the stored procedures?, the answer you will get will be a unanimous yes.

The problem is that people rarely use them in fact in real applications. Everybody knows they should but nobody really does.

More than that: despite the fact that stored procedures in fact can be more efficient than plain queries (even for one-time tasks), people often prefer the latter. Writing a procedure to obtain a resultset seems to be an overkill from the point of view of most developers, they rather prefer doing it with a plain SQL query.

Why so?

Let me tell you another boring story. Don’t worry, it won’t take long.
Read the rest of this entry »

Written by Quassnoi

July 5th, 2009 at 11:00 pm

Posted in Miscellaneous