EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Happy New Year!. You're welcome to read and comment on it.

Analytic functions: NTILE

In the previous article we dealt with analytic functions SUM, AVG and ROW_NUMBER().

Now we will try to emulate NTILE.

NTILE(N) is a special function that has no aggregate analog. It divides each grouping set of rows into N subranges, based on ORDER BY clause, and returns the subrange number for each row.

If we query for NTILE(5) OVER (PARTITION BY grouper ORDER BY id), it will return 1 for the 20% of rows with lowest id, 2 for the next 20% of rows, etc.

Easy to see that we need a COUNT for each grouper to use in our calculations. Unlike SUM, AVG and ROW_NUMBER(), the value of this function does not depend only on value of the previous queries.

That’s why we’ll need an extra join:

SELECT  ao.*, FLOOR((@r * @_n) / cnt) + 1 AS `NTILE(5) OVER (PARTITION BY grouper ORDER BY id)`
FROM    (
        SELECT  @_n := 5,
                @_grouper := 'N'
        ) vars,
        (
        SELECT  a.*, cnt
        FROM    (
                SELECT  grouper, COUNT(*) AS cnt
                FROM    t_aggregator
                GROUP BY
                        grouper
                ) cnt,  t_aggregator a
        WHERE   a.grouper = cnt.grouper
        ORDER BY
                grouper, id
        ) ao
WHERE   CASE WHEN @_grouper <> grouper THEN @r := -1 ELSE 0 END IS NOT NULL
        AND (@r := @r + 1) IS NOT NULL
        AND (@_grouper := grouper) IS NOT NULL

As you can see, we select the COUNT(*) of rows for each grouper and join the results of the query with t_aggregator, so along with each row of t_aggregator we have the total count of rows for the given grouper.

In the SELECT clause, we just count the percentile our current row belongs to, multiply it by N and add a 1 (as NTILE results are 1-based).

Written by Quassnoi

March 9th, 2009 at 11:00 pm

Posted in MySQL

Comments are closed.