## Rounding numbers preserving their sum

Comments enabled. I *really* need your comment

From **Stack Overflow**:

I'm looking for a way to round every value contained in a column without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).

Rounding every value to the nearest integer won't work (ex :

(1.5, 1.5)will be rounded to either(1, 1)or(2, 2))It's quite easy to do this using several requests (e. g. storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.

Is there a way to do this using a single

SQLrequest?I'm using

SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa

We have two restrictions here:

- The sum of the rounded numbers should be equal to that of the unrounded ones.
- When rounding, no number with a greater fractional part can be rounded down if there is a number with a smaller fractional part that had been rounded up.

We cannot round **(1.4, 1.4, 1.2)** to **(1, 1, 2)**, since **1.4** would be rounded down while **1.2** (with smaller fractional part) is rounded up. However, we can round them to **(1, 2, 1)** or **(1, 2, 1)**, since nothing prevents us from rounding numbers with *equal* fractional part differently.

Given this, we should build a query that would take a set of numbers and round them taking these two restrictions into account.

It is easily seen than to satisfy the second restriction a certain rounding threshold should exist: all numbers with the fractional part greater than this threshold are rounded up, those with the fractional part smaller than then threshold are rounded down, those with the fractional part equal to the threshold can be rounded both ways.

However, calculating this threshold is, first, quite a difficult task and, second, we still don't know what to do with the numbers having the fractional part equal to the threshold.

Instead, we use another approach.

Let's create a sample tables and show this approach:

This script creates a table with **10** random values in it.

The values are multiples of **0.1** with random fractional parts (from **5.0** to **5.4**) and their sum is exactly **52**:

SELECT value FROM [20090921_round].t_round

value |
---|

5.4 |

5.1 |

5.2 |

5.2 |

5.3 |

5.0 |

5.1 |

5.4 |

5.1 |

5.2 |

To round a value we should apply a rounding function to it which either truncates the fractional part (like `FLOOR`

) or rounds up to the closest greater integer (like `CEILING`

).

Now, what will we have if we apply `FLOOR`

to *all* values?

SELECT SUM(FLOOR(value)) FROM [20090921_round].t_round

50 |

We see that truncating the fractional part makes the sum to be less by **2**. That means that all fractional parts added make **2**.

Now, what will be have if we round everything up?

SELECT SUM(CEILING(value)) FROM [20090921_round].t_round

59 |

This makes **59**.

Note that we have **10** values in our table and the difference between **59** and **50** is only **9**.

This is because there is no difference between the `FLOOR`

and `CEILING`

for the integer numbers (and only for them). If the number is not integer, the difference will be exactly **1**.

And it's easy to see that indeed there is exactly **1** integer number in the table. All remaining **9** values are fractional and therefore contribute to the difference.

So we see that there are **9** numbers that affect the final sum being rounded differently. It is also easy to see that each value contribution to the difference amounts exactly to **1**.

Now, the difference between `SUM(FLOOR(value))`

and `SUM(value)`

is **2**.

So if we take **2** of the fractional values and `CEILING`

them, then take another **7** and `FLOOR`

them, we will get exactly the original sum.

But which values should we floor and which should we ceil?

Of course we should order the values by their fractional parts in descending order (so that those with greater fractional parts would go first) and `CEILING`

the top ones first. That will implicitly make the ceiling threshold I mentioned earlier and `CEILING`

will be only applied only to the values exceeding the threshold (or, probably, equal to that).

To pick the right values we should utilize analytical function `ROW_NUMBER()`

:

First, we calculate the difference between `SUM(value)`

and `SUM(FLOOR(VALUE))`

. This gives us the number of values we should ceil rather than floor to get the correct sum. Let's denote this difference by `N`

. Then we take first `N`

values using `ROW_NUMBER()`

and apply `CEILING`

to them, while applying `FLOOR`

to other.

Here's how do we do it in **SQL Server**:

SELECT value, CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= SUM(value) OVER() - SUM(FLOOR(value)) OVER() THEN CEILING(value) ELSE FLOOR(value) END AS rvalue FROM [20090921_round].t_round

value | rvalue |
---|---|

5.4 | 6 |

5.4 | 6 |

5.3 | 5 |

5.2 | 5 |

5.2 | 5 |

5.2 | 5 |

5.1 | 5 |

5.1 | 5 |

5.1 | 5 |

5.0 | 5 |

We see that `CEILING`

is only applied to the first two values with greatest fractional parts. All other values are rounded down to the nearest smaller integer.

Now, let's check the sum of the values:

SELECT SUM(rvalue) AS rsum FROM ( SELECT CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= SUM(value) OVER() - SUM(FLOOR(value)) OVER() THEN CEILING(value) ELSE FLOOR(value) END AS rvalue FROM [20090921_round].t_round ) q

rsum |
---|

52 |

Just as in the original table.