Archive for March, 2009
Hierarchical queries in MySQL: finding loops
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
Today, we will check our structure for loops.
Loops in hierarchical queries occur when a row contains itself in its ancestry chain.
In general, it's a sign of a database logic flaw and should be avioded, but sometimes it's deliberately used.
The simplest and most easily detectable case is having id
equal to parent
(meaning that the row is both its father and child). This may be tested by using a simple comparison condition.
But a loop of course can be more complex: a row can be its own grandfather, grand-grandfather etc.
To detect such loops, we will improve our functions a little.
Read the rest of this entry »
Hierarchical queries in MySQL: finding leaves
Comments enabled. I *really* need your comment
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
In this article, we will find the leaves of the hierarchy tree.
A row is a leaf in a hierarchy tree if it has no children.
In Oracle, there is a speical pseudocolumn that tests if a given row is a leaf in a CONNECT BY
query. Surprisingly, it's called CONNECT_BY_ISLEAF
.
Read the rest of this entry »
Hierarchical queries in MySQL: adding ancestry chains.
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
Today, we will improve the query a little more.
If we develop the catalog with categories, subcategories etc, it's useful to have breadcrumb links somewhere in the header of the page.
In Oracle, there is a special function called SYS_CONNECT_BY_PATH(row_expression, delimiter)
, that returns the ancestry chain of a given node.
Read the rest of this entry »
Hierarchical queries in MySQL: adding level
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
Today I will add some more features to these queries.
We have almost 100,000 rows in the table, 8 hierarchy levels, and our query is written so that it always selects all rows.
Now let's try to present our data in a Yahoo-style catalog. We need only top 2 levels: parent categories and their subcategories.
Read the rest of this entry »
Hierarchical queries in MySQL
This is a series of articles on hierarchical queries in MySQL:
- Hierarchical queries in MySQL
- Hierarchical queries in MySQL: adding level
- Hierarchical queries in MySQL: adding ancestry chains.
- Hierarchical queries in MySQL: finding leaves
- Hierarchical queries in MySQL: finding loops
See also:
There is no need in explaining how convenient hierarchical queries are.
A quick reminder: hierarchical data is a parent-child relationship contained in one table.
A typical task is to return values from the table in the following way:
- Resultset should be sorted like a tree, that is lexical sort by ancestry chains
- Depth level should be returned along with each row
It may sound confusing, but it's very simple in fact, like shown on this Oracle query:
Read the rest of this entry »
Counting page views
Comments enabled. I *really* need your comment
From Stack Overflow:
I have tables called
pages
,page_views
,page_items
andpage_votes
.The latter three tables contain a
page_id
foreign key in order to keep a record of each individual view, item and vote that belongs to a page.When I query a page, I also want to retrieve
COUNT(*)
frompage_views
,COUNT(*)
frompage_items
andSUM(vote)
frompage_votes
.I have pasted a query below. It retrieves the total number of views.
SELECT Page.*, COUNT(*) AS views FROM pages AS Page INNER JOIN page_views AS PageView ON Page.id = PageView.page_id GROUP BY Page.id ORDER BY views DESC LIMIT 10 OFFSET 0I've made various attempts to add items and votes to it, but the result is either a syntax error or views/items/votes returned as an identical and
wrongnumber, probably due to the way I am joining.How can I add items and votes to this query?
On a first thought, it's simple:
Read the rest of this entry »
Aggregated hours
Comments enabled. I *really* need your comment
From Stack Overflow:
I would like to know the sum of all web services completions per hour of day.
Obviously, this can be easily done with
SUM()
andGROUP BY
:SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
hour SUM(calls) 0 634 1 642 2 633 3 624 4 420 5 479 6 428 7 424 8 473 9 434 10 485 11 567 12 526 13 513 14 555 15 679 16 624 17 796 18 752 19 843 20 827 21 774 22 647 23 533 My problem is that in old sets, the web service calls in the hours
[00-11]
were already summed up.The simple statement as listed above would therefore lead to
SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
hour SUM(calls) 0 6234 12 526 13 513 14 555 15 679 16 624 17 796 18 752 19 843 20 827 21 774 22 647 23 533 This is an undesirable result. To make the old sets
[00,12,...,23]
comparable to the new sets[00,01,...,23]
, I would like to have one statement that averages the value of[00]
and distributes it over the missing hours, e.g.:
hour SUM(calls) 0 6243/12 1 6243/12 12 526 23 533 I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.
In this rowset, we have a row that needs to be split into 12 another rows.
Read the rest of this entry »
Deleting duplicates
Comments enabled. I *really* need your comment
Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.
Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.
With SQL Server 2005 and above, there is much more elegant solution.
Let's create a table and fill it with duplicate records:
CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL) GO SET NOCOUNT ON BEGIN TRANSACTION DECLARE @i INT SET @i = 1 WHILE @i < 5000 BEGIN INSERT INTO t_duplicate VALUES (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR)) SET @i = @i + 1 END COMMIT SELECT * FROM t_duplicate [/sourcecode] <table class="terminal"> <tr><th>id</th><th>value</th></tr> <tr><td>1</td><td>Value 1</td></tr> <tr><td>1</td><td>Value 2</td></tr> <tr><td>1</td><td>Value 3</td></tr> <tr class="break"><td colspan="100"></td></tr> <tr><td>5</td><td>Value 4997</td></tr> <tr><td>5</td><td>Value 4998</td></tr> <tr><td>5</td><td>Value 4999</td></tr> </table> And now let's delete the duplicates: WITH q AS ( SELECT d.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn FROM t_duplicate d ) DELETE FROM q WHERE rn > 1 SELECT * FROM t_duplicate
id | value |
---|---|
1 | Value 1 |
2 | Value 1000 |
3 | Value 2000 |
4 | Value 3000 |
5 | Value 4000 |
Done in a single query.
Finding incomplete orders
Comments enabled. I *really* need your comment
Imagine we are keeping an online shop and want to find the customers that don't have complete orders.
We'll make the structure of the orders a little bit complex:
- Each customer may have a number of baskets
- Each basket will have a list of positions in it
- Each position has a number of discounts
- An order is considered complete, when all entites are present: there is at least one basket, all baskets should have at least one position, and each position should have at least one discount
We will keep the data in four tables as following:
Read the rest of this entry »
Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE
Comments enabled. I *really* need your comment
In the previous article I wrote about optimized emulation of the analytic functions in MySQL.
Now, let's try to optimize LAG
, LEAD
, FIRST_VALUE
and LAST_VALUE
.
Imagine we have a table that keeps actions of a PC in an online game. This table has the following design:
- Action id
- PC id
- Current PC level
- Current PC score
- Action data
For each action, the current level and current score of the PC are keeped. The table, of course, is designed in such a bad way just to illustrate our task :)
Now, for first 2 player characters, we need to know first 2 actions performed on first 2 levels; how much score did these actions yield; and how much score left to reach the next level.
With analytic functions, it would be done the following way:
Read the rest of this entry »