Analytic functions: NTILE
Comments enabled. I *really* need your comment
In the previous article we dealt with analytic functions SUM, AVG and ROW_NUMBER().
Now we will try to emulate NTILE.
NTILE(N) is a special function that has no aggregate analog. It divides each grouping set of rows into N subranges, based on ORDER BY clause, and returns the subrange number for each row.
If we query for NTILE(5) OVER (PARTITION BY grouper ORDER BY id), it will return 1 for the 20% of rows with lowest id, 2 for the next 20% of rows, etc.
Easy to see that we need a COUNT for each grouper to use in our calculations. Unlike SUM, AVG and ROW_NUMBER(), the value of this function does not depend only on value of the previous queries.
That's why we'll need an extra join:
SELECT ao.*, FLOOR((@r * @_n) / cnt) + 1 AS `NTILE(5) OVER (PARTITION BY grouper ORDER BY id)` FROM ( SELECT @_n := 5, @_grouper := 'N' ) vars, ( SELECT a.*, cnt FROM ( SELECT grouper, COUNT(*) AS cnt FROM t_aggregator GROUP BY grouper ) cnt, t_aggregator a WHERE a.grouper = cnt.grouper ORDER BY grouper, id ) ao WHERE CASE WHEN @_grouper <> grouper THEN @r := -1 ELSE 0 END IS NOT NULL AND (@r := @r + 1) IS NOT NULL AND (@_grouper := grouper) IS NOT NULL
As you can see, we select the COUNT(*) of rows for each grouper and join the results of the query with t_aggregator, so along with each row of t_aggregator we have the total count of rows for the given grouper.
In the SELECT clause, we just count the percentile our current row belongs to, multiply it by N and add a 1 (as NTILE results are 1-based).
Subscribe in a reader