Quassnoi, Author at EXPLAIN EXTENDED - Page 27 of 28 at EXPLAIN EXTENDED

Author Archive

Deleting duplicates

Comments enabled. I *really* need your comment

Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.

Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.

With SQL Server 2005 and above, there is much more elegant solution.

Let's create a table and fill it with duplicate records:

CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL)
GO

SET NOCOUNT ON

BEGIN TRANSACTION

DECLARE @i INT
SET @i = 1
WHILE @i < 5000
BEGIN
        INSERT
        INTO    t_duplicate
        VALUES  (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR))
        SET @i = @i + 1
END

COMMIT

SELECT  *
FROM    t_duplicate
&#91;/sourcecode&#93;

<table class="terminal">
<tr><th>id</th><th>value</th></tr>
<tr><td>1</td><td>Value 1</td></tr>
<tr><td>1</td><td>Value 2</td></tr>
<tr><td>1</td><td>Value 3</td></tr>
<tr class="break"><td colspan="100"></td></tr>
<tr><td>5</td><td>Value 4997</td></tr>
<tr><td>5</td><td>Value 4998</td></tr>
<tr><td>5</td><td>Value 4999</td></tr>
</table>

And now let's delete the duplicates:


WITH    q AS
        (
        SELECT  d.*,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
        FROM    t_duplicate d
        ) 
DELETE
FROM    q
WHERE   rn > 1

SELECT  *
FROM    t_duplicate

id	value
1	Value 1
2	Value 1000
3	Value 2000
4	Value 3000
5	Value 4000

Done in a single query.

Written by Quassnoi

March 14th, 2009 at 11:00 pm

Posted in SQL Server

Finding incomplete orders

Comments enabled. I *really* need your comment

Imagine we are keeping an online shop and want to find the customers that don't have complete orders.

We'll make the structure of the orders a little bit complex:

Each customer may have a number of baskets
Each basket will have a list of positions in it
Each position has a number of discounts
An order is considered complete, when all entites are present: there is at least one basket, all baskets should have at least one position, and each position should have at least one discount

We will keep the data in four tables as following:
Read the rest of this entry »

Written by Quassnoi

March 13th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE

Comments enabled. I *really* need your comment

In the previous article I wrote about optimized emulation of the analytic functions in MySQL.

Now, let's try to optimize LAG, LEAD, FIRST_VALUE and LAST_VALUE.

Imagine we have a table that keeps actions of a PC in an online game. This table has the following design:

Action id
PC id
Current PC level
Current PC score
Action data

For each action, the current level and current score of the PC are keeped. The table, of course, is designed in such a bad way just to illustrate our task :)

Now, for first 2 player characters, we need to know first 2 actions performed on first 2 levels; how much score did these actions yield; and how much score left to reach the next level.

With analytic functions, it would be done the following way:
Read the rest of this entry »

Written by Quassnoi

March 12th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: optimizing SUM and ROW_NUMBER

Comments enabled. I *really* need your comment

In the previous articles I wrote about emulating numerous analytic function in MySQL.

Using methods described above, it's possible to emulate almost all analytic functions present in Oracle and SQL Server.

Here are these methods in a nutshell:

Select all table rows ordered by PARTITION BY columns, then by ORDER BY columns of the analytic function
Track the grouing sets by using session variables initialized in the first subquery
If the analytic function needs some precalculations to be evaluated (like, count of the rows in the grouping set, sum of the values etc), join the table with the precalculated aggregates
Use state session variables to calculate the analytic function and store intermediate values between rows
Initialize state session variables whenever the grouping set changes

This may sound confusing, but if you take a look on the examples from the previous articles, it will become clear as a bell.

This methods work and work well, if you need to select all rows from the tables.

But what if you need to implement some filtering? Do we really need to count millions of rows if we need first three? Do we really need to inspect all rows to find a maximum if we have an index?

Of course, no.

Analytic functions can be optimized as well as any other queries.
Read the rest of this entry »

Written by Quassnoi

March 11th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG

with 5 comments

In the previous articles I wrote about emulating some of the analytic functions in MySQL.

Today, I'll write about four more userful functions: FIRST_VALUE, LAST_VALUE, LEAD and LAG.

These functions also do not have aggregate analogs.

FIRST VALUE(column) returns the value of column from the first row of the grouping set.

LAST_VALUE(column) returns the value of column from the last row of the grouping set.

This can be illustrated by the following query:
Read the rest of this entry »

Written by Quassnoi

March 10th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: NTILE

Comments enabled. I *really* need your comment

In the previous article we dealt with analytic functions SUM, AVG and ROW_NUMBER().

Now we will try to emulate NTILE.

NTILE(N) is a special function that has no aggregate analog. It divides each grouping set of rows into N subranges, based on ORDER BY clause, and returns the subrange number for each row.
Read the rest of this entry »

Written by Quassnoi

March 9th, 2009 at 11:00 pm

Posted in MySQL

Analytic functions: SUM, AVG, ROW_NUMBER

with 5 comments

In one of the previous articles I wrote about emulating some of analytic functions in MySQL.

Now, I'd like to cover this question more extensively.

A quick reminder: an analytic function is a function that behaves like an aggregate function with one exception: aggregate function returns one last row for each aggregated set, while an analytic function returns intermediate results too.

An analytic function can be made out of almost all aggregate functions by adding keyword OVER to them with two additional clauses: PARTITION BY and ORDER BY.

PARTITION BY is analog of GROUP BY. ORDER BY defines order in which the intermediate rows will be evaluated.

The behaviour of analytic functions can probably be best illustrated with an example:
Read the rest of this entry »

Written by Quassnoi

March 8th, 2009 at 11:00 pm

Posted in MySQL

Selecting friends

with 2 comments

If you are building a Yet Another Great Social Network Service to beat MySpace, you'll certainly need to keep a list of friends there, so that Alice may communicate in private with Bob, and they both can show pictures to Chris, and Eve cannot eavesdrop on them and the rest of them can do all these kinds of things these people are supposed to do.

On most networks, friendship is an irreflexive symmetric binary relation:

Symmetric means that if Alice is a friend of Bob, then Bob is a friend of Alice too.
Irreflexive means that Alice is never a friend to herself.

As it's a many-to-many relation, we sure need a separate table for it.

But how will we keep it? Should we keep the relation in the table as is (i. e. two separate rows for Alice/Bob and Bob/Alice), or keep just one row and reconstruct the relation using the set operators?

Let's check.
Read the rest of this entry »

Written by Quassnoi

March 7th, 2009 at 11:00 pm

Posted in MySQL

Advanced row sampling

with one comment

Yesterday I wrote an article on how to emulate analytiс function ROW_NUMBER() that is present in SQL Server and Oracle, but absent in MySQL.

Today, we will try to optimize this query.
Read the rest of this entry »

Written by Quassnoi

March 6th, 2009 at 9:00 pm

Posted in MySQL

Row sampling

with 2 comments

Sometimes we need to get a sample row from a table satisfying a certain condition. Like, get a first row for each month.

MS SQL and Oracle supply analytical function ROW_NUMBER() for this purpose.

Let's create a simple table to illustrate our needs and see how do we query it.
Read the rest of this entry »

Written by Quassnoi

March 5th, 2009 at 9:00 pm

Posted in MySQL

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

EXPLAIN EXTENDED

Author Archive

Deleting duplicates

Finding incomplete orders

Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE

Analytic functions: optimizing SUM and ROW_NUMBER

Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG

Analytic functions: NTILE

Analytic functions: SUM, AVG, ROW_NUMBER

Selecting friends

Advanced row sampling

Row sampling

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

Author Archive

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow