EXPLAIN EXTENDED

How to create fast database queries

Author Archive

Article-aware title filtering

Comments enabled. I *really* need your comment

From Stack Overflow:

I need to make an alphabetical listing of movie titles, so I need to show only items that begin with a chosen letter. To make this slightly more complicated, many titles start with "the" or "a", which needs to be ignored.

How would the mysql query look to achieve such a task?

Let's create sample tables and see:
Read the rest of this entry »

Written by Quassnoi

March 23rd, 2009 at 11:00 pm

Posted in MySQL

Selecting non-unique rows

with one comment

Sometimes we need to select all rows for a table that have duplicate values in some of the columns. Like, we want to select all user comments for all posts commented by more than one user.

If there are two or more comments for a post, we select all comments for this post; if there is only one comment, we select none.

Let's create the sample tables to illustrate our needs:
Read the rest of this entry »

Written by Quassnoi

March 22nd, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: finding loops

with 4 comments

Today, we will check our structure for loops.

Loops in hierarchical queries occur when a row contains itself in its ancestry chain.

In general, it's a sign of a database logic flaw and should be avioded, but sometimes it's deliberately used.

The simplest and most easily detectable case is having id equal to parent (meaning that the row is both its father and child). This may be tested by using a simple comparison condition.

But a loop of course can be more complex: a row can be its own grandfather, grand-grandfather etc.

To detect such loops, we will improve our functions a little.
Read the rest of this entry »

Written by Quassnoi

March 21st, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: finding leaves

Comments enabled. I *really* need your comment

In this article, we will find the leaves of the hierarchy tree.

A row is a leaf in a hierarchy tree if it has no children.

In Oracle, there is a speical pseudocolumn that tests if a given row is a leaf in a CONNECT BY query. Surprisingly, it's called CONNECT_BY_ISLEAF.
Read the rest of this entry »

Written by Quassnoi

March 20th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: adding ancestry chains.

with 4 comments

Today, we will improve the query a little more.

If we develop the catalog with categories, subcategories etc, it's useful to have breadcrumb links somewhere in the header of the page.

In Oracle, there is a special function called SYS_CONNECT_BY_PATH(row_expression, delimiter), that returns the ancestry chain of a given node.
Read the rest of this entry »

Written by Quassnoi

March 19th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: adding level

with 4 comments

Today I will add some more features to these queries.

We have almost 100,000 rows in the table, 8 hierarchy levels, and our query is written so that it always selects all rows.

Now let's try to present our data in a Yahoo-style catalog. We need only top 2 levels: parent categories and their subcategories.
Read the rest of this entry »

Written by Quassnoi

March 18th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL

with 52 comments

There is no need in explaining how convenient hierarchical queries are.

A quick reminder: hierarchical data is a parent-child relationship contained in one table.

A typical task is to return values from the table in the following way:

  1. Resultset should be sorted like a tree, that is lexical sort by ancestry chains
  2. Depth level should be returned along with each row

It may sound confusing, but it's very simple in fact, like shown on this Oracle query:
Read the rest of this entry »

Written by Quassnoi

March 17th, 2009 at 11:00 pm

Posted in MySQL

Counting page views

Comments enabled. I *really* need your comment

From Stack Overflow:

I have tables called pages, page_views, page_items and page_votes.

The latter three tables contain a page_id foreign key in order to keep a record of each individual view, item and vote that belongs to a page.

When I query a page, I also want to retrieve COUNT(*) from page_views, COUNT(*) from page_items and SUM(vote) from page_votes.

I have pasted a query below. It retrieves the total number of views.

SELECT  Page.*, COUNT(*) AS views
FROM    pages AS Page 
INNER JOIN
        page_views AS PageView
ON      Page.id = PageView.page_id 
GROUP BY
        Page.id 
ORDER BY
        views DESC   
LIMIT 10 OFFSET 0

I've made various attempts to add items and votes to it, but the result is either a syntax error or views/items/votes returned as an identical and wrong number, probably due to the way I am joining.

How can I add items and votes to this query?

On a first thought, it's simple:
Read the rest of this entry »

Written by Quassnoi

March 16th, 2009 at 11:00 pm

Posted in MySQL

Aggregated hours

Comments enabled. I *really* need your comment

From Stack Overflow:

I would like to know the sum of all web services completions per hour of day.

Obviously, this can be easily done with SUM() and GROUP BY:

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 634
1 642
2 633
3 624
4 420
5 479
6 428
7 424
8 473
9 434
10 485
11 567
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

My problem is that in old sets, the web service calls in the hours [00-11] were already summed up.

The simple statement as listed above would therefore lead to

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 6234
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23], I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:

hour SUM(calls)
0 6243/12
1 6243/12
12 526
23 533

I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.

In this rowset, we have a row that needs to be split into 12 another rows.
Read the rest of this entry »

Written by Quassnoi

March 15th, 2009 at 11:00 pm

Posted in MySQL

Deleting duplicates

Comments enabled. I *really* need your comment

Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.

Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.

With SQL Server 2005 and above, there is much more elegant solution.

Let's create a table and fill it with duplicate records:

CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL)
GO

SET NOCOUNT ON

BEGIN TRANSACTION

DECLARE @i INT
SET @i = 1
WHILE @i < 5000
BEGIN
        INSERT
        INTO    t_duplicate
        VALUES  (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR))
        SET @i = @i + 1
END

COMMIT

SELECT  *
FROM    t_duplicate
&#91;/sourcecode&#93;

<table class="terminal">
<tr><th>id</th><th>value</th></tr>
<tr><td>1</td><td>Value 1</td></tr>
<tr><td>1</td><td>Value 2</td></tr>
<tr><td>1</td><td>Value 3</td></tr>
<tr class="break"><td colspan="100"></td></tr>
<tr><td>5</td><td>Value 4997</td></tr>
<tr><td>5</td><td>Value 4998</td></tr>
<tr><td>5</td><td>Value 4999</td></tr>
</table>

And now let's delete the duplicates:


WITH    q AS
        (
        SELECT  d.*,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
        FROM    t_duplicate d
        ) 
DELETE
FROM    q
WHERE   rn > 1

SELECT  *
FROM    t_duplicate
id value
1 Value 1
2 Value 1000
3 Value 2000
4 Value 3000
5 Value 4000

Done in a single query.

Written by Quassnoi

March 14th, 2009 at 11:00 pm

Posted in SQL Server