Archive for the ‘Miscellaneous’ Category
Things SQL needs: determining range cardinality
What is the problem with this query?
SELECT *
FROM orders
WHERE quantity < = 4
AND urgency <= 4
The problem is indexing strategy, of course. Which columns should we index?
If we index quantity, the optimizer will be able to use the index to filter on it. However, filtering on urgency will require scanning all records with quantity < 4 and applying the urgency filter to each record found.
Same with urgency. We can use range access on urgency using an index, but this will require filtering on quantity.
Why, create a composite index!
, some will say.
Unfortunately, that won’t help much.
A composite B-Tree index maintains what is called a lexicographical order of the records. This means that an index on (quantity, urgency) will sort on quantity, and only if the quantities are equal, it will take the urgency into account.
The picture below shows how would the records be ordered in such an index:

As we can see, with a single index range scan (i. e. just following the arrows) we cannot select only the records within the dashed rectangle. There is no single index range that could be used to filter on both columns.
Even if we changed the field order in the index, it would just change the direction of the arrows connecting the records:
Read the rest of this entry »
Things SQL needs: MERGE JOIN that would seek
One of the most known and least used join algorithms in SQL engines is MERGE JOIN.
This algorithm operates on two sorted recordsets, keeping two pointers that chase each other.
The Wikipedia entry above describes it quite well in terms of algorithms. I’ll just make an animated GIF to make it more clear:

This is quite a nice and elegant algorithm, which, unfortunately, has two major drawbacks:
- It needs the recordsets to be sorted
- Even with the recordsets sorted, it is no better than a
HASH JOIN
The sorting part is essential for this algorithm and there is nothing that can be done with it: the recordsets should be sorted, period. Databases, however, often provide the records in the sorted order: from clustered tables, indexes, previously sorted and ordered subqueries, spool tables etc.
But even when the recordsets are already sorted, on equijoins the MERGE JOIN is hardly faster than a HASH JOIN.
NoSQL
I had a vision tonight.

Image by nataliej
A huge, dark, grim figure approached me, seized me with its long bony arms and made me see all the vanity of the world we are living in.
Bloated database engines, useless ACID requirements, meaningless joins are now in the past for me.
I decided to move to NoSQL.
Where do I begin?
Things SQL needs: sargability of monotonic functions

Image by ndevil
I’m going to write a series of articles about the things SQL needs to work faster and more efficienly.
With these things implemented, most of the tricks I explain in my blog will become redundant and I’ll finally have some more time to spend with the family.
Ever wondered why a condition like this:
WHERE TRUNC(mydate) = TRUNC(SYSDATE)
, which searches for the current day’s records, is so elegant but so slow?
Of course this is because even if you create an index on mydate, this index cannot be used.
The expression in the left part of the equality is not a mydate. The database engine cannot find a way to use an index to search for it. It is said that this expression is not sargable.
Now, a little explanation about the indexes and sargability. If you are familiar with these, you can skip this chapter. But beware that this chapter is the only one illustrated, so skipping it will make the article too boring to read.
Ahem.
To locate a record in a B-Tree index, the keys of the index should be compared to the value being searched for.
Let’s consider this sample B-Tree index:
Happy New Year!
Oracle New Year tree
WITH tree AS
(
SELECT /*+ MATERIALIZE */
' ,,,"''' AS needles, 'Oo%$&' AS decorations
FROM dual
),
branches AS
(
SELECT level AS id, level - TRUNC((level - 2) / 6) * 2 AS s
FROM dual
CONNECT BY
level <= 24
)
SELECT RPAD(' ', 18, ' ') || '*' AS tree
FROM dual
UNION ALL
SELECT RPAD(' ', 18 - s, ' ') || '/' ||
(
SELECT REPLACE(
SYS_CONNECT_BY_PATH
(
CASE
WHEN DBMS_RANDOM.value < 0.1 THEN
SUBSTR(decorations, TRUNC(DBMS_RANDOM.value * LENGTH(decorations)) + 1, 1)
ELSE
SUBSTR(needles, TRUNC(DBMS_RANDOM.value * LENGTH(needles)) + 1, 1)
END , '/'
),
'/', ''
)
FROM tree
WHERE CONNECT_BY_ISLEAF = 1
CONNECT BY
level <= s * 2 - 1
)
|| '\'
FROM branches
| TREE |
|---|
| * |
| /,\ |
| /,, \ |
| /$," ,\ |
| /, ',, \ |
| / ,'o'' ,\ |
| /,, ,'",,,, \ |
| /,',,$, '' ,,\ |
| /,, ,'",,,, \ |
| /",, ,,"' , ",\ |
| / $' ,, " ,' '\ |
| /, , '"O,,'o"," "\ |
| /,,,," , ',,',,',',\ |
| /,',,"' ' ' ,,,,&"o,\ |
| /,' ,,," ,',%" ,,,\ |
| /,',,"' ' ' ,,,,&"o,\ |
| /,$,,," ,,$'%",",,%,,''"\ |
| / ' ,, ," ,,& , ,, ,, \ |
| /,",O' ,',, , ,$, "',",,,',\ |
| /,," ",, ,,, ,',', " ,""',\ |
| / ,O ',,,,," o , '','', ,&,\ |
| /,," ",, ,,, ,',', " ,""',\ |
| /,"'',,""$"," ,$,%,' "" ,,,',",\ |
| /, '" "O,,o' , ,'% $',,, ,""" , \ |
| /, O' ,,, &,',,' , ',$, ,' ,,, ,,'\ |
SQL Server bottle of champagne
WITH lines AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM lines
WHERE n < 24
),
bottle_parts (part, lft, body, rgt, w) AS
(
SELECT 'mouth', '.', '_', '.', 2
UNION ALL
SELECT 'neck', '|', ' ', '|', 2
UNION ALL
SELECT 'bell', '/', ' ', '\', NULL
UNION ALL
SELECT 'border', '/', '=', '\', NULL
UNION ALL
SELECT 'body', '|', ' ', '|', 7
UNION ALL
SELECT 'bottom', '(', '_', ')', 7
),
bottle_lines AS
(
SELECT CASE
WHEN n = 1 THEN 'mouth'
WHEN n BETWEEN 2 AND 8 THEN 'neck'
WHEN n = 11 THEN 'border'
WHEN n BETWEEN 9 AND 13 THEN 'bell'
WHEN n BETWEEN 13 AND 23 THEN 'body'
WHEN n = 24 THEN 'bottom'
END AS part,
n
FROM lines
),
bottle_width AS
(
SELECT bp.*, n, COALESCE(w, (n - 6)) AS width
FROM bottle_lines bl
JOIN
bottle_parts bp
ON bp.part = bl.part
),
bottle AS
(
SELECT n, width, REPLICATE(' ', 18 - width) + lft + REPLICATE(body, width * 2) + rgt AS line
FROM bottle_width
),
bubbles (pos, m, bubble) AS
(
SELECT CAST(ROUND(RAND() * 4, 0, 1) AS INTEGER), 1, ':' AS bubble
UNION ALL
SELECT pos + ASCII(SUBSTRING(CAST(NEWID() AS VARCHAR(MAX)), 1, 1)) % 3 - 1, m + 1, SUBSTRING(':.oO', m / 3 + 1, 1)
FROM bubbles
WHERE m < 12
),
limits AS
(
SELECT *,
18 +
CASE
WHEN pos < 1 - width THEN 1 - width
WHEN pos > width - 1 THEN width - 1
ELSE pos
END AS rpos
FROM bottle bo
LEFT HASH JOIN
bubbles bu
ON bu.m = 24 - bo.n
)
SELECT SUBSTRING(line, 1, COALESCE(rpos, 1) - 1) + COALESCE(bubble, ' ') + SUBSTRING(line, COALESCE(rpos, 1) + 1, 200) AS bottle
FROM limits
ORDER BY
n
| bottle |
|---|
| .____. |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| / \ |
| / \ |
| /==========\ |
| / O \ |
| / O \ |
| | O | |
| | o | |
| | o | |
| | o | |
| | . | |
| | . | |
| | . | |
| | : | |
| | : | |
| | : | |
| (______________) |
PostgreSQL fireworks
WITH centers AS
(
SELECT angle,
len,
ROUND(len * SIN(2 * PI() * angle)) AS x,
ROUND(len * COS(2 * PI() * angle)) AS y,
ROUND(len * 0.3)::INTEGER + 1 AS trace
FROM (
SELECT RANDOM() AS angle,
8 * (1 - POWER(RANDOM(), 3)) AS len
FROM generate_series (1, 50) s
) q
),
traces AS
(
SELECT *,
generate_series(1, trace) AS part
FROM centers
),
parts AS
(
SELECT CASE
WHEN trace = part THEN
LEAST(len * 0.2, 2)::INTEGER
ELSE
TRUNC(angle * 8 - 0.5)::INTEGER % 4 + 3
END AS symbol,
TRUNC(x + part * SIN(2 * PI() * angle)) AS x,
TRUNC(y + part * COS(2 * PI() * angle)) AS y
FROM traces
)
SELECT ARRAY_TO_STRING(
ARRAY(
SELECT COALESCE(
(
SELECT SUBSTR(E'.xX\\-/|', MIN(symbol) + 1, 1)
FROM parts
WHERE x = col - 14
AND y = row - 12
), ' ')
FROM generate_series(1, 25) col
), ''
) AS FIREWORKS
FROM generate_series(1, 24) row
| fireworks |
|---|
| X |
| | X |
| x| | X |
| x | | / |
| X\ \x | /x |
| X\ \ \ / |
| X \ \ x / /xx |
| \\ x |. /// |
| X– \ | |
| xxx– . |
| Xx– .. |
| xxx-/ –x |
| . \\ –X |
| . . |
| / \ |
| X/ | \ \x |
| / |\ \\ |
| // |x\ \ \x |
| x/ |\x \ X |
| X x\x X |
| X |
MySQL postcard
SELECT CONCAT(COALESCE(border, '|'), RPAD(COALESCE(postcard, ''), 61, COALESCE(filler, ' ')), COALESCE(border, '|')) AS postcard
FROM (
SELECT id2 * 5 + id1 AS line
FROM (
SELECT 1 AS id1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
) q1
CROSS JOIN
(
SELECT 0 AS id2 UNION ALL SELECT 1
) q2
) dummy
LEFT JOIN
(
SELECT 3 AS line, CONCAT(' Dear ', USER(), ',') AS postcard
UNION ALL
SELECT 4, ' I wish you good luck in the New Year.'
UNION ALL
SELECT 5, ' May all your plans be optimal, all your queries be answered'
UNION ALL
SELECT 6, ' and all your estimations be correct.'
UNION ALL
SELECT 8, ' Yours sincerely,'
UNION ALL
SELECT 9, ' Quassnoi'
) ln
ON ln.line = dummy.line
LEFT JOIN
(
SELECT 1 AS line, '.' AS border, '_' AS filler
UNION ALL
SELECT 10 AS line, '.' AS border, '_' AS filler
) borders
ON borders.line = dummy.line
ORDER BY
dummy.line
| postcard |
|---|
| ._____________________________________________________________. |
| | | |
| | Dear dbuser@localhost, | |
| | I wish you good luck in the New Year. | |
| | May all your plans be optimal, all your queries be answered | |
| | and all your estimations be correct. | |
| | | |
| | Yours sincerely, | |
| | Quassnoi | |
| ._____________________________________________________________. |
What is entity-relationship model?

Image by Samuel Mann
A relational database, as we all know from the previous article, stores relations between integers, strings and other simple types in a very plain way: it just enumerates all values related.
This model is extremely flexible, since other relations can be easily derived from existing ones using mathematical formulae, and the relational database takes care of that.
However, database should reflect the real world in some way to be really of use. Since the relational databases store relations between mathematical abstractions and not real world things, we should make some kind of a mapping of ones to the others.
This is what entity-relationship model is for.
An entity-relationship model, as one can easily guess from its name, models relationships between entities.
But since we know that databases do essentially the same, how does it differ from the database model?
- An entity-relationship model states which data and relations between them should be stored
- A database model states how these relations are stored
In other words, ER model is design and database model is one of the ways to implement it. ER model is said to be above the database model in the waterfall developement.
Note that in both cases I use the word stored above. The model says nothing of data and relations between them that may or should exist, only of those that should be stored. Every human being participates in thousands if not millions relationships, but an ER model should state which of them are to be stored, and a relational model should decide how to store them. Of course it would be nice to store them all (and police and insurance would be just happy), but the technology does not allow it yet. Don’t know whether it is good or bad.
A typical entity-relationship diagram in Peter Chen’s notation looks like this:

What does it mean?
Read the rest of this entry »
What is a relational database?
My blog is about SQL, and SQL is about relational databases.
So, what’s SQL and what’s a relational database and what’s an entity-relationship model?
These terms are often misused and this leads to confusion.
In this article, I’d like to explain what a relational database is.
For starters, a little quote from Jack London’s The League of the Old Men
:

An Indian healer
Howkan began to read. Imber listened for a space, when a wonderment rose up in his face and he broke in abruptly.
That be my talk, Howkan. Yet from thy lips it comes when thy ears have not heard.Howkan smirked with self-appreciation. His hair was parted in the middle.
Nay, from the paper it comes, O Imber. Never have my ears heard. From the paper it comes, through my eyes, into my head, and out of my mouth to thee. Thus it comes.
Thus it comes? It be there in the paper?Imber’s voice sank in whisperful awe as he crackled the sheets ‘twixt thumb and finger and stared at the charactery scrawled thereon.It be a great medicine, Howkan, and thou art a worker of wonders.
It be nothing, it be nothing,the young man responded carelessly and pridefully. He read at hazard from the document:In that year, before the break of the ice, came an old man, and a boy who was lame of one foot. These also did I kill, and the old man made much noise—
It be true,Imber interrupted breathlessly.He made much noise and would not die for a long time. But how dost thou know, Howkan? The chief man of the white men told thee, mayhap? No one beheld me, and him alone have I told.Howkan shook his head with impatience.
Have I not told thee it be there in the paper, O fool?Imber stared hard at the ink-scrawled surface.
As the hunter looks upon the snow and says, Here but yesterday there passed a rabbit; and here by the willow scrub it stood and listened, and heard, and was afraid; and here it turned upon its trail; and here it went with great swiftness, leaping wide; and here, with greater swiftness and wider leapings, came a lynx; and here, where the claws cut deep into the snow, the lynx made a very great leap; and here it struck, with the rabbit under and rolling belly up; and here leads off the trail of the lynx alone, and there is no more rabbit,—as the hunter looks upon the markings of the snow and says thus and so and here, dost thou, too, look upon the paper and say thus and so and here be the things old Imber hath done?
Old Imber made a great analogy. Information is but markings on the snow. When you look at the snow, there are no rabbit or lynx there. But when an experienced hunter sees their traces he can go and find them.
However, the old Indian could only read the information while his nephew could also create it by writing the words on paper and reading them back. He had full control over the information he created.
And for information to be useful, it should be structured.
When a car dealer describes a car he really wants to sell he can spend half an hour talking of magnificent design, good extras and a decent price, but just forget
to mention that, say, the warranty terms are not those you’d expect.
But when you fill a credit card application form, you answer a set of predefined questions. What is your name? Where do you live? How much do you earn? Do you have incurable cancer?
The bank needs all this information to make sure you will pay your debt (or at least they can get a decent price for your debt when it’s sold to a collection agency).
And they want you to hide no important information about yourself and to represent all information in such a way that it can be easily interpreted and make sure nothing they are interested in would be forgotten
to answer or left behind.
They want the data to be structured. And by making you to fill a form they enforce the structure of the information you provide.
They force you to use their data model.
Presentation layer
Comments enabled. I *really* need your comment
A question frequently asked by many web application developers:
Where do I build a presentation layer for my data?
Do I do it on SQL side or on PHP / ASP /whatever side?
This is a very interesting question.
A famous Chinese proverb (which like the majority of famous Chinese proverbs is invented by a Western journalist) says A picture is worth a thousand words
.

A picture is worth a thousand words.
Image by zedzap
How do you think, why it is? What’s so good in a picture?
Seems to be a strange question. A picture just tells more. If you look at a picture, you get an overall impression, which can be hardly expressed using words. Words just don’t pull the right strings in your mind, those that a picture does. And the words are harder to read, they are more numerous. And a dozen of other reasons.
However, the main reason seems to be this: you cannot give a complete description of a picture using words
.
But can it be so that a word is worth a thousand pictures?
Sure it can. And any book fan who had seen the screen version of his favorite work can confirm it.
From Douglas Adams’s masterpiece The Hitchiker’s Guide to the Galaxy
:
A hole had just appeared in the Galaxy. It was exactly a nothingth of a second long, a nothingth of an inch wide, and quite a lot of million light years from end to end.
Or this one:
Five wild Event Maelstroms swirled in vicious storms of unreason and spewed up a pavement. On the pavement lay Ford Prefect and Arthur Dent gulping like half-spent fish.
Adams knew how to arrange his words so that they draw a perfect, vivid picture in the reader’s mind.
The problem is that everyone gets his very own picture.
What do you imagine when reading of a storm of unreason
? Can you show a nothingth of an inch wide
on your fingers? Can you sketch some Vogon poetry to show how really bad it is?
Double-thinking in SQL
One of the first things a novice SQL developer learns about is called thinking in SQL
, which is usually being opposed to procedural thinking
Let’s see what part of brain does this intellectual activity take and how to use it.
Two features distinguish SQL from other languages you learned as a 11-year old kid on your first PC, like BASIC or perl or maybe even C++ if you’re such a Wunderkind.
First, SQL is set-based. It does things with sets.
Every tool is designed to do things with something else. Like, you use a hammer to do things with nails, or use a screwdriver to do things with screws, or use an oven to do things with food.
Same with computer languages.
BASIC does things with variables. perl does things with scalars, arrays, hashes and file streams. Assembly does things with registers and memory.
You should not be confused by something like registers are just a special case of variables
, or a hash is just a generalized container which exposes this and this method
or something like that. No.
A hash is a hash, a variable is a variable and a register is a register.
Like, an egg is a food and rice is a food and it’s possible to cook some eggs in a rice cooker and vice versa, but they are just wrong tools to do that.

Akihabara district in Tokyo City. You can buy any type of gadget you can imagine, which is about 2% of what you will find there.
Prehistoric men had to make do with hammerstones and lithic blades (even to court their women), but now we have a whole district in Tokyo City for gadgets with USB type A, and another district for gadgets with USB type B.
So if you feel the urge to hash something and then make a good old array out of this, you don’t use assembly, but perl or PHP instead.
Same with SQL. SQL does things with sets.
It’s a tool that allows you to take a dozen or two of sets, mix them together, knead and wedge them then chop them apart and mix again, but the output you get is still a set and all inputs are sets.
Eeverything you do in SQL, you do it on sets. That’s why SQL is called a set-oriented language.
Ok, that was the first feature that distinguishes SQL from other languages. What’s the second one?
SQL is a declarative language. This means that you express what
you want to do with sets, not how
you want to do it.
This requires a little explanation.
Read the rest of this entry »
Schema junk
Many tasks in SQL world can be efficiently performed using stored procedures.
All database management systems of the Big Four
(that is Oracle, SQL Server, MySQL and PostgreSQL) have a concept of a stored procedure: a little server-managed program written in a procedural language that allows running SQL statements and returning results.
They are widely recommended for use, since they allow easy separation of business rules from data storage rules, fine-grained security, integrity control and all other things software analysts mumble for their living.
People widely accept this theory and if you ask anyone should I use the stored procedures?
, the answer you will get will be a unanimous yes.
The problem is that people rarely use them in fact in real applications. Everybody knows they should but nobody really does.
More than that: despite the fact that stored procedures in fact can be more efficient than plain queries (even for one-time tasks), people often prefer the latter. Writing a procedure to obtain a resultset seems to be an overkill
from the point of view of most developers, they rather prefer doing it with a plain SQL query.
Why so?
Let me tell you another boring story. Don’t worry, it won’t take long.
Read the rest of this entry »

Subscribe in a reader