EXPLAIN EXTENDED

How to create fast database queries

Archive for March, 2009

Hierarchical queries in MySQL: finding loops

with 4 comments

Today, we will check our structure for loops.

Loops in hierarchical queries occur when a row contains itself in its ancestry chain.

In general, it's a sign of a database logic flaw and should be avioded, but sometimes it's deliberately used.

The simplest and most easily detectable case is having id equal to parent (meaning that the row is both its father and child). This may be tested by using a simple comparison condition.

But a loop of course can be more complex: a row can be its own grandfather, grand-grandfather etc.

To detect such loops, we will improve our functions a little.
Read the rest of this entry »

Written by Quassnoi

March 21st, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: finding leaves

Comments enabled. I *really* need your comment

In this article, we will find the leaves of the hierarchy tree.

A row is a leaf in a hierarchy tree if it has no children.

In Oracle, there is a speical pseudocolumn that tests if a given row is a leaf in a CONNECT BY query. Surprisingly, it's called CONNECT_BY_ISLEAF.
Read the rest of this entry »

Written by Quassnoi

March 20th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: adding ancestry chains.

with 4 comments

Today, we will improve the query a little more.

If we develop the catalog with categories, subcategories etc, it's useful to have breadcrumb links somewhere in the header of the page.

In Oracle, there is a special function called SYS_CONNECT_BY_PATH(row_expression, delimiter), that returns the ancestry chain of a given node.
Read the rest of this entry »

Written by Quassnoi

March 19th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL: adding level

with 4 comments

Today I will add some more features to these queries.

We have almost 100,000 rows in the table, 8 hierarchy levels, and our query is written so that it always selects all rows.

Now let's try to present our data in a Yahoo-style catalog. We need only top 2 levels: parent categories and their subcategories.
Read the rest of this entry »

Written by Quassnoi

March 18th, 2009 at 11:00 pm

Posted in MySQL

Hierarchical queries in MySQL

with 52 comments

There is no need in explaining how convenient hierarchical queries are.

A quick reminder: hierarchical data is a parent-child relationship contained in one table.

A typical task is to return values from the table in the following way:

  1. Resultset should be sorted like a tree, that is lexical sort by ancestry chains
  2. Depth level should be returned along with each row

It may sound confusing, but it's very simple in fact, like shown on this Oracle query:
Read the rest of this entry »

Written by Quassnoi

March 17th, 2009 at 11:00 pm

Posted in MySQL

Counting page views

Comments enabled. I *really* need your comment

From Stack Overflow:

I have tables called pages, page_views, page_items and page_votes.

The latter three tables contain a page_id foreign key in order to keep a record of each individual view, item and vote that belongs to a page.

When I query a page, I also want to retrieve COUNT(*) from page_views, COUNT(*) from page_items and SUM(vote) from page_votes.

I have pasted a query below. It retrieves the total number of views.

SELECT  Page.*, COUNT(*) AS views
FROM    pages AS Page 
INNER JOIN
        page_views AS PageView
ON      Page.id = PageView.page_id 
GROUP BY
        Page.id 
ORDER BY
        views DESC   
LIMIT 10 OFFSET 0

I've made various attempts to add items and votes to it, but the result is either a syntax error or views/items/votes returned as an identical and wrong number, probably due to the way I am joining.

How can I add items and votes to this query?

On a first thought, it's simple:
Read the rest of this entry »

Written by Quassnoi

March 16th, 2009 at 11:00 pm

Posted in MySQL

Aggregated hours

Comments enabled. I *really* need your comment

From Stack Overflow:

I would like to know the sum of all web services completions per hour of day.

Obviously, this can be easily done with SUM() and GROUP BY:

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 634
1 642
2 633
3 624
4 420
5 479
6 428
7 424
8 473
9 434
10 485
11 567
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

My problem is that in old sets, the web service calls in the hours [00-11] were already summed up.

The simple statement as listed above would therefore lead to

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 6234
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23], I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:

hour SUM(calls)
0 6243/12
1 6243/12
12 526
23 533

I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.

In this rowset, we have a row that needs to be split into 12 another rows.
Read the rest of this entry »

Written by Quassnoi

March 15th, 2009 at 11:00 pm

Posted in MySQL

Deleting duplicates

Comments enabled. I *really* need your comment

Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.

Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.

With SQL Server 2005 and above, there is much more elegant solution.

Let's create a table and fill it with duplicate records:

CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL)
GO

SET NOCOUNT ON

BEGIN TRANSACTION

DECLARE @i INT
SET @i = 1
WHILE @i < 5000
BEGIN
        INSERT
        INTO    t_duplicate
        VALUES  (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR))
        SET @i = @i + 1
END

COMMIT

SELECT  *
FROM    t_duplicate
&#91;/sourcecode&#93;

<table class="terminal">
<tr><th>id</th><th>value</th></tr>
<tr><td>1</td><td>Value 1</td></tr>
<tr><td>1</td><td>Value 2</td></tr>
<tr><td>1</td><td>Value 3</td></tr>
<tr class="break"><td colspan="100"></td></tr>
<tr><td>5</td><td>Value 4997</td></tr>
<tr><td>5</td><td>Value 4998</td></tr>
<tr><td>5</td><td>Value 4999</td></tr>
</table>

And now let's delete the duplicates:


WITH    q AS
        (
        SELECT  d.*,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
        FROM    t_duplicate d
        ) 
DELETE
FROM    q
WHERE   rn > 1

SELECT  *
FROM    t_duplicate
id value
1 Value 1
2 Value 1000
3 Value 2000
4 Value 3000
5 Value 4000

Done in a single query.

Written by Quassnoi

March 14th, 2009 at 11:00 pm

Posted in SQL Server

Finding incomplete orders

Comments enabled. I *really* need your comment

Imagine we are keeping an online shop and want to find the customers that don't have complete orders.

We'll make the structure of the orders a little bit complex:

  • Each customer may have a number of baskets
  • Each basket will have a list of positions in it
  • Each position has a number of discounts
  • An order is considered complete, when all entites are present: there is at least one basket, all baskets should have at least one position, and each position should have at least one discount

We will keep the data in four tables as following:
Read the rest of this entry »

Written by Quassnoi

March 13th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE

Comments enabled. I *really* need your comment

In the previous article I wrote about optimized emulation of the analytic functions in MySQL.

Now, let's try to optimize LAG, LEAD, FIRST_VALUE and LAST_VALUE.

Imagine we have a table that keeps actions of a PC in an online game. This table has the following design:

  1. Action id
  2. PC id
  3. Current PC level
  4. Current PC score
  5. Action data

For each action, the current level and current score of the PC are keeped. The table, of course, is designed in such a bad way just to illustrate our task :)

Now, for first 2 player characters, we need to know first 2 actions performed on first 2 levels; how much score did these actions yield; and how much score left to reach the next level.

With analytic functions, it would be done the following way:
Read the rest of this entry »

Written by Quassnoi

March 12th, 2009 at 11:00 pm

Posted in MySQL