EXPLAIN EXTENDED

How to create fast database queries

Rounding numbers preserving their sum

From Stack Overflow:

I'm looking for a way to round every value contained in a column without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).

Rounding every value to the nearest integer won't work (ex : (1.5, 1.5) will be rounded to either (1, 1) or (2, 2))

It's quite easy to do this using several requests (e. g. storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.

Is there a way to do this using a single SQL request?

I'm using SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.

I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa

We have two restrictions here:

1. The sum of the rounded numbers should be equal to that of the unrounded ones.
2. When rounding, no number with a greater fractional part can be rounded down if there is a number with a smaller fractional part that had been rounded up.

We cannot round (1.4, 1.4, 1.2) to (1, 1, 2), since 1.4 would be rounded down while 1.2 (with smaller fractional part) is rounded up. However, we can round them to (1, 2, 1) or (1, 2, 1), since nothing prevents us from rounding numbers with equal fractional part differently.

Given this, we should build a query that would take a set of numbers and round them taking these two restrictions into account.

It is easily seen than to satisfy the second restriction a certain rounding threshold should exist: all numbers with the fractional part greater than this threshold are rounded up, those with the fractional part smaller than then threshold are rounded down, those with the fractional part equal to the threshold can be rounded both ways.

However, calculating this threshold is, first, quite a difficult task and, second, we still don't know what to do with the numbers having the fractional part equal to the threshold.

Let's create a sample tables and show this approach:

This script creates a table with 10 random values in it.

The values are multiples of 0.1 with random fractional parts (from 5.0 to 5.4) and their sum is exactly 52:

```SELECT  value
FROM    [20090921_round].t_round
```
value
5.4
5.1
5.2
5.2
5.3
5.0
5.1
5.4
5.1
5.2

To round a value we should apply a rounding function to it which either truncates the fractional part (like `FLOOR`) or rounds up to the closest greater integer (like `CEILING`).

Now, what will we have if we apply `FLOOR` to all values?

```SELECT  SUM(FLOOR(value))
FROM    [20090921_round].t_round
```
50

We see that truncating the fractional part makes the sum to be less by 2. That means that all fractional parts added make 2.

Now, what will be have if we round everything up?

```SELECT  SUM(CEILING(value))
FROM    [20090921_round].t_round
```
59

This makes 59.

Note that we have 10 values in our table and the difference between 59 and 50 is only 9.

This is because there is no difference between the `FLOOR` and `CEILING` for the integer numbers (and only for them). If the number is not integer, the difference will be exactly 1.

And it's easy to see that indeed there is exactly 1 integer number in the table. All remaining 9 values are fractional and therefore contribute to the difference.

So we see that there are 9 numbers that affect the final sum being rounded differently. It is also easy to see that each value contribution to the difference amounts exactly to 1.

Now, the difference between `SUM(FLOOR(value))` and `SUM(value)` is 2.

So if we take 2 of the fractional values and `CEILING` them, then take another 7 and `FLOOR` them, we will get exactly the original sum.

But which values should we floor and which should we ceil?

Of course we should order the values by their fractional parts in descending order (so that those with greater fractional parts would go first) and `CEILING` the top ones first. That will implicitly make the ceiling threshold I mentioned earlier and `CEILING` will be only applied only to the values exceeding the threshold (or, probably, equal to that).

To pick the right values we should utilize analytical function `ROW_NUMBER()`:

First, we calculate the difference between `SUM(value)` and `SUM(FLOOR(VALUE))`. This gives us the number of values we should ceil rather than floor to get the correct sum. Let's denote this difference by `N`. Then we take first `N` values using `ROW_NUMBER()` and apply `CEILING` to them, while applying `FLOOR` to other.

Here's how do we do it in SQL Server:

```SELECT  value,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) &lt;= SUM(value) OVER() - SUM(FLOOR(value)) OVER()
THEN
CEILING(value)
ELSE
FLOOR(value)
END AS rvalue
FROM    [20090921_round].t_round
```
value rvalue
5.4 6
5.4 6
5.3 5
5.2 5
5.2 5
5.2 5
5.1 5
5.1 5
5.1 5
5.0 5

We see that `CEILING` is only applied to the first two values with greatest fractional parts. All other values are rounded down to the nearest smaller integer.

Now, let's check the sum of the values:

```SELECT  SUM(rvalue) AS rsum
FROM    (
SELECT  CASE
WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) &lt;= SUM(value) OVER() - SUM(FLOOR(value)) OVER()
THEN
CEILING(value)
ELSE
FLOOR(value)
END AS rvalue
FROM    [20090921_round].t_round
) q
```
rsum
52

Just as in the original table.

Written by Quassnoi

September 21st, 2009 at 11:00 pm

Posted in SQL Server