EXPLAIN EXTENDED

How to create fast database queries

Archive for May 17th, 2009

Ranging random values

Comments enabled. I *really* need your comment

From Stack Overflow:

Given a set of random numeric values in an SQL Server database, how do I generate a limited list of ranges where each range contains at least one value?

The ranges should not overlap and ideally have a similar amount of values in them.

This is a good job for SQL Server's analytic function NTILE.

This function, given a parameter N, assigns a rank from 1 to N to each value, just as in quadrile (4), quintile (5), percentile (100) etc.

Lower 1/N of values get a rank of 1, from 1/N to 2/N get a rank of 2, etc.

The value, though, are integer and random, and we can get gaps between the adjacent ranges.

To avoid this, we need to extend the ranges a little, so each range gets a common bound with both upper and lower neighbor. This is best done by using the middle of the gap as a common range boundary.

Say, NTILE assigned rank 1 to values from 100 to 200, rank 2 to values from 210 to 300, and rank 3 to values from 320 to 400.

Then we will use 205 as a common bound for the first two ranges, and 310 as a commond bound for the next two.

To do this, for each value selected, we will get also the previous one and the next one.

It's obvious that the value that comes before the first value in any group will be the last value in the previous group, and vice versa: the value next to the last value in a group will be the first value in the next group.

Knowing the previous and the next value for each group, we can use them to calculate the median and use it as the range boundary.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 17th, 2009 at 11:00 pm

Posted in SQL Server