## Archive for March, 2009

## Analytic functions: optimizing SUM and ROW_NUMBER

Comments enabled. I *really* need your comment

In the previous articles I wrote about emulating numerous analytic function in **MySQL**.

Using methods described above, it's possible to emulate almost all analytic functions present in **Oracle** and **SQL Server**.

Here are these methods in a nutshell:

- Select all table rows ordered by
`PARTITION BY`

columns, then by`ORDER BY`

columns of the analytic function - Track the grouing sets by using session variables initialized in the first subquery
- If the analytic function needs some precalculations to be evaluated (like, count of the rows in the grouping set, sum of the values etc), join the table with the precalculated aggregates
- Use state session variables to calculate the analytic function and store intermediate values between rows
- Initialize state session variables whenever the grouping set changes

This may sound confusing, but if you take a look on the examples from the previous articles, it will become clear as a bell.

This methods work and work well, if you need to select all rows from the tables.

But what if you need to implement some filtering? Do we really need to count millions of rows if we need first three? Do we really need to inspect all rows to find a maximum if we have an index?

Of course, no.

Analytic functions can be optimized as well as any other queries.

Read the rest of this entry »

## Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG

In the previous articles I wrote about emulating some of the analytic functions in **MySQL**.

Today, I'll write about four more userful functions: `FIRST_VALUE`

, `LAST_VALUE`

, `LEAD`

and `LAG`

.

These functions also do not have aggregate analogs.

`FIRST VALUE(column)`

returns the value of column from the first row of the grouping set.

`LAST_VALUE(column)`

returns the value of column from the last row of the grouping set.

This can be illustrated by the following query:

Read the rest of this entry »

## Analytic functions: NTILE

Comments enabled. I *really* need your comment

In the previous article we dealt with analytic functions `SUM`

, `AVG`

and `ROW_NUMBER()`

.

Now we will try to emulate `NTILE`

.

`NTILE(N)`

is a special function that has no aggregate analog. It divides each grouping set of rows into `N`

subranges, based on `ORDER BY`

clause, and returns the subrange number for each row.

Read the rest of this entry »

## Analytic functions: SUM, AVG, ROW_NUMBER

In one of the previous articles I wrote about emulating some of analytic functions in **MySQL**.

Now, I'd like to cover this question more extensively.

A quick reminder: an **analytic function** is a function that behaves like an **aggregate function** with one exception: **aggregate function** returns one last row for each aggregated set, while an **analytic function** returns intermediate results too.

An **analytic function** can be made out of almost all **aggregate functions** by adding keyword `OVER`

to them with two additional clauses: `PARTITION BY`

and `ORDER BY`

.

`PARTITION BY`

is analog of `GROUP BY`

. `ORDER BY`

defines order in which the intermediate rows will be evaluated.

The behaviour of analytic functions can probably be best illustrated with an example:

Read the rest of this entry »

## Selecting friends

If you are building a Yet Another Great Social Network Service to beat MySpace, you'll certainly need to keep a list of friends there, so that **Alice** may communicate in private with **Bob**, and they both can show pictures to **Chris**, and **Eve** cannot eavesdrop on them and the rest of them can do all these kinds of things these people are supposed to do.

On most networks, **friendship** is an irreflexive symmetric binary relation:

- Symmetric means that if
**Alice**is a friend of**Bob**, then**Bob**is a friend of**Alice**too. - Irreflexive means that
**Alice**is never a friend to herself.

As it's a many-to-many relation, we sure need a separate table for it.

But how will we keep it? Should we keep the relation in the table as is (i. e. two separate rows for **Alice/Bob** and **Bob/Alice**), or keep just one row and reconstruct the relation using the set operators?

Let's check.

Read the rest of this entry »

## Advanced row sampling

Yesterday I wrote an article on how to emulate analytiс function `ROW_NUMBER()`

that is present in **SQL Server** and **Oracle**, but absent in **MySQL**.

Today, we will try to optimize this query.

Read the rest of this entry »

## Row sampling

Sometimes we need to get a sample row from a table satisfying a certain condition. Like, get a first row for each month.

`MS SQL`

and `Oracle`

supply analytical function `ROW_NUMBER()`

for this purpose.

Let's create a simple table to illustrate our needs and see how do we query it.

Read the rest of this entry »

## Aggregate concatenation

Comments enabled. I *really* need your comment

Aggregate concatenation functions help creating a concatenated list out of a recordset. Useful for reports, hierarchical trees, etc.

**MySQL** supplies `GROUP_CONCAT`

for this purpose. `SYS_CONNECT_BY PATH`

and `FOR XML`

can be used in **Oracle** and **MS SQL**.

In **PostgreSQL**, we cannot use these tricks, but we can create our own aggregate function. And this function will also accept two more extremely useful parameters: `DELIMITER`

and `IS_DISTINCT`

.

Read the rest of this entry »

## First common ancestor

From **Stack Overflow**:

Let's say that we have we have a table with the classic

`manager_id`

recursive relationship:

`Users (user_id int, manager_id int)`

(refers to`user_id`

)If you randomly select

2rows in the table, or2nodes, how do you find the lowest level common ancestor? My platform isSQL Server 2005 (Transact-SQL), but anyANSIcompliantSQLwill also work...

Very nice question.

This may be useful to check against any kind of common ancestry in a tree structure: classes, folders, etc.

Read the rest of this entry »

## Zen update

Comments enabled. I *really* need your comment

Yesterday I wrote an article about selecting random rows efficiently.

But today on **Stack Overflow**:

Hi

I wish to attach a column to my table which will be a random number from a sequential list = to the number of rows.

So, if my table had

999rows, then the numbers1to999would be assigned randomly and uniquely.Now, I figured that I could add a dummy

`TempRandomColumn=Rand()`

, sort by that and add the numbers sequentially using`PHP`

. But that means999`MySQL`

statements.Is there a way to do this using a single

`MySQL`

statement?Thanks for any pointers.

Well, it's just that simple:

and performing an update:

SET @r := 0; UPDATE t_zen SET zen_order = (@r := @r + 1) ORDER BY RAND(20030302)

`ORDER BY RAND()`

certainly has some beauty in it.