Archive for August, 2009
PostgreSQL: selecting items that belong to all categories
Comments enabled. I *really* need your comment
From Stack Overflow:
I've got two tables in SQL, one with a project and one with categories that projects belong to, i.e. the JOIN would look roughly like:
Project Category Foo Apple Foo Banana Foo Carrot Bar Apple Bar Carrot Qux Apple Qux Banana What I want to do is allow filtering such that users can select any number of categories and results will be filtered to items that are members of all the selected categories.
For example, if a user selects categories Apple and Banana, projects Foo and Qux show up.
If a user select categories Apple, Banana, and Carrot then only the Foo project shows up.
A usual approach to this problem involves GROUP BY
and COUNT
. However, this can be made more efficient.
Let's create a sample table:
SQL Server: leading wildcard match using an index
Comments enabled. I *really* need your comment
From Stack Overflow:
I'm joining to a table dozens of different times, and every time, I join (or filter) based on the results of a
SUBSTRING
of one of the columns (it's a string, but left-padded with zeros, and I don't care about the last four digits).As a result, even though this column is indexed and my query would use the index, it does a table scan because the
SUBSTRING
itself isn't indexed, so SQL Server has to calculate it for every row before as it's joining.I'm looking for any ideas on how to speed up this process.
This problem is called leading wildcard match. Though it's possible to implement it, it usually requires indexing all possible combinations of substrings which usually bloats the index.
However, in this very case it's possible to create quite an efficient index. It is possible because the number of characters we need to trim from the right is fixed.
The main idea is to create a computed column which would store trimmed and reversed value of the string, index that column and use this index in search.
Let's create a sample table:
What is a relational database?
My blog is about SQL, and SQL is about relational databases.
So, what's SQL and what's a relational database and what's an entity-relationship model?
These terms are often misused and this leads to confusion.
In this article, I'd like to explain what a relational database is.
For starters, a little quote from Jack London's The League of the Old Men
:
Howkan began to read. Imber listened for a space, when a wonderment rose up in his face and he broke in abruptly.
That be my talk, Howkan. Yet from thy lips it comes when thy ears have not heard.Howkan smirked with self-appreciation. His hair was parted in the middle.
Nay, from the paper it comes, O Imber. Never have my ears heard. From the paper it comes, through my eyes, into my head, and out of my mouth to thee. Thus it comes.
Thus it comes? It be there in the paper?Imber's voice sank in whisperful awe as he crackled the sheets 'twixt thumb and finger and stared at the charactery scrawled thereon.It be a great medicine, Howkan, and thou art a worker of wonders.
It be nothing, it be nothing,the young man responded carelessly and pridefully. He read at hazard from the document:In that year, before the break of the ice, came an old man, and a boy who was lame of one foot. These also did I kill, and the old man made much noise—
It be true,Imber interrupted breathlessly.He made much noise and would not die for a long time. But how dost thou know, Howkan? The chief man of the white men told thee, mayhap? No one beheld me, and him alone have I told.Howkan shook his head with impatience.
Have I not told thee it be there in the paper, O fool?Imber stared hard at the ink-scrawled surface.
As the hunter looks upon the snow and says, Here but yesterday there passed a rabbit; and here by the willow scrub it stood and listened, and heard, and was afraid; and here it turned upon its trail; and here it went with great swiftness, leaping wide; and here, with greater swiftness and wider leapings, came a lynx; and here, where the claws cut deep into the snow, the lynx made a very great leap; and here it struck, with the rabbit under and rolling belly up; and here leads off the trail of the lynx alone, and there is no more rabbit,—as the hunter looks upon the markings of the snow and says thus and so and here, dost thou, too, look upon the paper and say thus and so and here be the things old Imber hath done?
Old Imber made a great analogy. Information is but markings on the snow. When you look at the snow, there are no rabbit or lynx there. But when an experienced hunter sees their traces he can go and find them.
However, the old Indian could only read the information while his nephew could also create it by writing the words on paper and reading them back. He had full control over the information he created.
And for information to be useful, it should be structured.
When a car dealer describes a car he really wants to sell he can spend half an hour talking of magnificent design, good extras and a decent price, but just forget
to mention that, say, the warranty terms are not those you'd expect.
But when you fill a credit card application form, you answer a set of predefined questions. What is your name? Where do you live? How much do you earn? Do you have incurable cancer?
The bank needs all this information to make sure you will pay your debt (or at least they can get a decent price for your debt when it's sold to a collection agency).
And they want you to hide no important information about yourself and to represent all information in such a way that it can be easily interpreted and make sure nothing they are interested in would be forgotten
to answer or left behind.
They want the data to be structured. And by making you to fill a form they enforce the structure of the information you provide.
They force you to use their data model.
SQL Server: finding date closest to a birthdate
Comments enabled. I *really* need your comment
From Stack Overflow:
The problem is with dynamically getting upcoming what we call
namedaysin Europe.Name day is a tradition in many countries in Europe and Latin America of celebrating on a particular day of the year associated with the one's given name.
Here's a sampe name day table (I store the date in this format:
1900-08-22
, but we really only need month and day):
First name Name day Bob 1900-04-22 Bob 1900-09-04 Frank 1900-01-02 The trick is that there might be multiple name days for each name, and someones name day is always the first one found after ones birthday.
So if Bob was born on August 5th, his name day would fall on September 4th, but if we were born after Sep 4th, his name day would be on April 22nd.
We need to find when a given person celebrates his/her name day based on the first name and birthdate.
What I need is an SQL Server query that will be able to get me name days for the people in my database.
We actually need to make the calendar to wrap around: to build a condition that would select 1904, Jan 1st after 1904, Dec 31th.
We can do it using a COALESCE
on the simple queries in OUTER APPLY
clause.
Let's create sample tables:
SQL Server: efficient DISTINCT on dates
Comments enabled. I *really* need your comment
From Stack Overflow:
I am wondering if there is a good-performing query to select distinct dates (ignoring times) from a table with a datetime field in SQL Server.
My problem isn't getting the server to actually do this (I've seen this question already, and we had something similar already in place using
DISTINCT
).The problem is whether there is any trick to get it done more quickly.
With the data we are using, our current query is returning ~80 distinct days for which there are ~40,000 rows of data (after filtering on another indexed column), there is an index on the date column, and the query always manages to take 5+ seconds. Which is too slow.
The problem with this query is that no distinct list of truncated dates is easily accessible.
That's why SQL Server has to fetch all the dates from the table (or from the index) and distinctify them somehow.
In Oracle we could probably create a function-based index that would index the truncated part of a DATETIME
column. Then it would just use jump over the distinct index keys.
Same is possible in MySQL (however, the latter does not support function-based indexes, so one has to update the date-only column manually).
SQL Server does support indexes on computed columns, however, its optimizer lacks ability to skip distinct records in an index.
That's why creating an index on a computed column doesn't help.
But since we are talking about the dates, it's possible to emulate this behavior to some extent using a recursive CTE that would generate a list of dates, and the checking each date.
Let's create a sample table:
MySQL: difference between sets
From Stack Overflow:
I have a table that holds data about items that existed at a certain time (regular snapshots taken).
Simple example:
Timestamp ID 1 A 1 B 2 A 2 B 2 C 3 A 3 D 4 D 4 E In this case, item
C
gets created sometime between snapshot 1 and 2, sometime between snapshot 2 and 3B
andC
disappear andD
gets created, etc.The table is reasonably large (millions of records) and for each timestamp there are about 50 records.
What's the most efficient way of selecting the item
id
s for items that disappear between two consecutive timestamps?So for the above example I would like to get the following:
Previous snapshot Current snapshot Removed 1 2 NULL 2 3 B, C 3 4 A If it doesn't make the query inefficient, can it be extended to automatically use the latest (i. e.
MAX
) timestamp and the previous one?
We basically need to do the following things here:
- Split the table into sets grouped by timestamp
- Compare each set with the one of previous timestamp
- Find the values missing in the current set and concatenate them
This is possible to do using only the standard ANSI SQL operators, however, this will be inefficient in MySQL
.
Let's create a sample table and see how to work around this:
Passing parameters in MySQL: IN list vs. temporary table
When you need to pass a list of parameters into a MySQL, there are at least two method to do this:
- Issue a dynamical query which would fill an
IN
list with constant values - Fill a temporary table with these values then use it in a
JOIN
The second method is considered better for many reasons (you can easily reuse the values, it's more injection-proof etc.)
Leaving all these issues aside, let's compare these methods performance-wise.
We will create a table of 50,000 records, pass a list of 500 parameters (using both methods) and see which is faster:
Hierarchical queries in MySQL: VARCHAR keys
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Basit asks:
I'm trying to implement a hierarchical query in MySQL using solutions you posted in your article.
How can I make the tree menu if the
id
is uniqueVARCHAR
, like the ones used on youtube?
We will need to fix the query and the function just a little:
- The function should return a
VARCHAR
- The
@start_with
and other variables should be initialized withVARCHAR
s - The initial condition for
_id
should be a minimalVARCHAR
possible, i. e. an empty string
Let's create a sample table and check how it works:
CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory; CREATE TABLE t_key ( id INT NOT NULL PRIMARY KEY, string_id VARCHAR(32) NOT NULL ) ENGINE=Memory; CREATE TABLE t_hierarchy ( id VARCHAR(32) NOT NULL PRIMARY KEY, parent VARCHAR(32) NOT NULL, data VARCHAR(50) NOT NULL, KEY ix_hierarchy_parent (parent) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_filler(20); COMMIT; INSERT INTO t_key SELECT id, ( SELECT GROUP_CONCAT(CHAR(65 + FLOOR(RAND(20090817) * 25)) SEPARATOR '') FROM ( SELECT NULL FROM filler LIMIT 8 ) q ) FROM filler; INSERT INTO t_hierarchy SELECT ki.string_id, COALESCE(kp.string_id, '_'), CONCAT('Row ', f.id) FROM filler f JOIN t_key ki ON ki.id = f.id LEFT JOIN t_key kp ON kp.id = f.id DIV 3;
Here's the function:
CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value VARCHAR(32)) RETURNS VARCHAR(32) NOT DETERMINISTIC READS SQL DATA BEGIN DECLARE _id VARCHAR(32); DECLARE _parent VARCHAR(32); DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL; SET _parent = @id; SET _id = ''; IF @id IS NULL THEN RETURN NULL; END IF; LOOP SELECT MIN(id) INTO @id FROM t_hierarchy WHERE parent = _parent AND id > _id; IF @id IS NOT NULL OR _parent = @start_with THEN SET @level = @level + 1; RETURN @id; END IF; SET @level := @level - 1; SELECT id, parent INTO _id, _parent FROM t_hierarchy WHERE id = _parent; END LOOP; END $$
And here's the query:
SELECT CONCAT(REPEAT(' ', level - 1), CAST(hi.id AS CHAR)) AS treeitem, parent, level FROM ( SELECT hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level FROM ( SELECT @start_with := '_', @id := @start_with, @level := 0 ) vars, t_hierarchy WHERE @id IS NOT NULL ) ho JOIN t_hierarchy hi ON hi.id = ho.id
treeitem | parent | level | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FDBVDDGT | _ | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ENCYPEWC | FDBVDDGT | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AJXOXCQA | ENCYPEWC | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CLYLKREW | ENCYPEWC | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
YFCUVVRX | ENCYPEWC | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
JVDDGUJJ | FDBVDDGT | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ELQWNYGL | JVDDGUJJ | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
KTRBGCQA | JVDDGUJJ | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SCGYDUQU | JVDDGUJJ | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
VEHXSYRL | FDBVDDGT | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BIKCFVOH | VEHXSYRL | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HAGFGSUW | VEHXSYRL | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TCBYTARM | VEHXSYRL | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GVPJESCI | _ | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HCNHAEWW | GVPJESCI | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
KNIDSGBM | GVPJESCI | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SJPBIMNF | GVPJESCI | 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
JETHGIVJ | SJPBIMNF | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OBPXSXJF | SJPBIMNF | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
VPOXBLBW | SJPBIMNF | 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 rows fetched in 0.0006s (0.0180s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | ALL | 20 | 100.00 | |||||
1 | PRIMARY | hi | eq_ref | PRIMARY | PRIMARY | 98 | ho.id | 1 | 100.00 | |
2 | DERIVED | <derived3> | system | 1 | 100.00 | |||||
2 | DERIVED | t_hierarchy | index | PRIMARY | 98 | 20 | 100.00 | Using where; Using index | ||
3 | DERIVED | No tables used |
Works just fine.
Presentation layer
Comments enabled. I *really* need your comment
A question frequently asked by many web application developers:
Where do I build a presentation layer for my data?
Do I do it on SQL side or on PHP / ASP /whatever side?
This is a very interesting question.
A famous Chinese proverb (which like the majority of famous Chinese proverbs is invented by a Western journalist) says A picture is worth a thousand words
.
How do you think, why it is? What's so good in a picture?
Seems to be a strange question. A picture just tells more. If you look at a picture, you get an overall impression, which can be hardly expressed using words. Words just don't pull the right strings in your mind, those that a picture does. And the words are harder to read, they are more numerous. And a dozen of other reasons.
However, the main reason seems to be this: you cannot give a complete description of a picture using words
.
But can it be so that a word is worth a thousand pictures?
Sure it can. And any book fan who had seen the screen version of his favorite work can confirm it.
From Douglas Adams's masterpiece The Hitchiker's Guide to the Galaxy
:
A hole had just appeared in the Galaxy. It was exactly a nothingth of a second long, a nothingth of an inch wide, and quite a lot of million light years from end to end.
Or this one:
Five wild Event Maelstroms swirled in vicious storms of unreason and spewed up a pavement. On the pavement lay Ford Prefect and Arthur Dent gulping like half-spent fish.
Adams knew how to arrange his words so that they draw a perfect, vivid picture in the reader's mind.
The problem is that everyone gets his very own picture.
What do you imagine when reading of a storm of unreason
? Can you show a nothingth of an inch wide
on your fingers? Can you sketch some Vogon poetry to show how really bad it is?
Oracle: updating rows with partitioned incremental value
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with these example data:
id idx data 1 1 A 1 3 B 2 1 C 2 1 D 2 5 E I want:
id idx data 1 1 A 1 2 B 2 1 C 2 3 D 2 4 E Is there a way to renumber the
index
perid
using a single SQLUPDATE
?Note that the order of items should be preserved (i. e. the item E should still be after the items C and D but the order of C and D doesn't really matter).
The goal is to be able to create a primary key over
id
andindex
.
If not for the id
, the query would be very simple:
UPDATE t_update SET index = rownum
However, the query above does not take id
into account.
To update the table using partitioned row numbers, we need to employ the window function ROW_NUMBER()
.
However, it's usable neither in the UPDATE
query itself, nor in an inline view, and attempt to use it produces an error:
ORA-30483: window functions are not allowed here
To work around the, we should rewrite the UPDATE
query as MERGE
.
MERGE
requires two rowsets (source and target) and they need to be joined on something.
We will use the table itself as a source They only field we can join on here is ROWID
pseudocolumn.
Here's the query to do this:
MERGE INTO t_update u USING ( SELECT rowid AS rid, ROW_NUMBER() OVER (PARTITION BY id ORDER BY idx) AS rn FROM t_update ) ON (u.rowid = rid) WHEN MATCHED THEN UPDATE SET idx = rn
, and here's the result:
SELECT * FROM t_update
ID | IDX | DATA |
---|---|---|
1 | 1 | A |
1 | 2 | B |
2 | 1 | C |
2 | 2 | D |
2 | 3 | E |
Works fine.