How to create fast database queries

Analytic functions: NTILE

Comments enabled. I *really* need your comment

In the previous article we dealt with analytic functions SUM, AVG and ROW_NUMBER().

Now we will try to emulate NTILE.

NTILE(N) is a special function that has no aggregate analog. It divides each grouping set of rows into N subranges, based on ORDER BY clause, and returns the subrange number for each row.

If we query for NTILE(5) OVER (PARTITION BY grouper ORDER BY id), it will return 1 for the 20% of rows with lowest id, 2 for the next 20% of rows, etc.

Easy to see that we need a COUNT for each grouper to use in our calculations. Unlike SUM, AVG and ROW_NUMBER(), the value of this function does not depend only on value of the previous queries.

That's why we'll need an extra join:

SELECT  ao.*, FLOOR((@r * @_n) / cnt) + 1 AS `NTILE(5) OVER (PARTITION BY grouper ORDER BY id)`
FROM    (
SELECT  @_n := 5,
@_grouper := 'N'
) vars,
SELECT  a.*, cnt
FROM    (
SELECT  grouper, COUNT(*) AS cnt
FROM    t_aggregator
) cnt,  t_aggregator a
WHERE   a.grouper = cnt.grouper
grouper, id
) ao
WHERE   CASE WHEN @_grouper <> grouper THEN @r := -1 ELSE 0 END IS NOT NULL
AND (@r := @r + 1) IS NOT NULL
AND (@_grouper := grouper) IS NOT NULL

As you can see, we select the COUNT(*) of rows for each grouper and join the results of the query with t_aggregator, so along with each row of t_aggregator we have the total count of rows for the given grouper.

In the SELECT clause, we just count the percentile our current row belongs to, multiply it by N and add a 1 (as NTILE results are 1-based).

Written by Quassnoi

March 9th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply