From Stack Overflow:
I'm looking for a way to round every value contained in a column without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).
Rounding every value to the nearest integer won't work (ex : (1.5, 1.5) will be rounded to either (1, 1) or (2, 2))
It's quite easy to do this using several requests (e. g. storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.
Is there a way to do this using a single SQL request?
I'm using SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.
I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa
We have two restrictions here:
- The sum of the rounded numbers should be equal to that of the unrounded ones.
- When rounding, no number with a greater fractional part can be rounded down if there is a number with a smaller fractional part that had been rounded up.
We cannot round (1.4, 1.4, 1.2) to (1, 1, 2), since 1.4 would be rounded down while 1.2 (with smaller fractional part) is rounded up. However, we can round them to (1, 2, 1) or (1, 2, 1), since nothing prevents us from rounding numbers with equal fractional part differently.
Given this, we should build a query that would take a set of numbers and round them taking these two restrictions into account.
It is easily seen than to satisfy the second restriction a certain rounding threshold should exist: all numbers with the fractional part greater than this threshold are rounded up, those with the fractional part smaller than then threshold are rounded down, those with the fractional part equal to the threshold can be rounded both ways.
However, calculating this threshold is, first, quite a difficult task and, second, we still don't know what to do with the numbers having the fractional part equal to the threshold.
Instead, we use another approach.
Let's create a sample tables and show this approach:
CREATE SCHEMA [20090921_round] CREATE TABLE [20090921_round].t_round ( value DECIMAL(10, 1) NOT NULL ) GO BEGIN TRANSACTION SELECT RAND(0.20090921) DECLARE @cnt INT SET @cnt = 1 WHILE @cnt <= 9 BEGIN INSERT INTO [20090921_round].t_round (value) VALUES (RAND() * 0.4 + 5) SET @cnt = @cnt + 1 END; INSERT INTO [20090921_round].t_round SELECT ROUND(52 - SUM(value), 1) FROM [20090921_round].t_round COMMIT
This script creates a table with 10 random values in it.
The values are multiples of 0.1 with random fractional parts (from 5.0 to 5.4) and their sum is exactly 52:
SELECT value FROM [20090921_round].t_round
To round a value we should apply a rounding function to it which either truncates the fractional part (like
FLOOR) or rounds up to the closest greater integer (like
Now, what will we have if we apply
FLOOR to all values?
SELECT SUM(FLOOR(value)) FROM [20090921_round].t_round
We see that truncating the fractional part makes the sum to be less by 2. That means that all fractional parts added make 2.
Now, what will be have if we round everything up?
SELECT SUM(CEILING(value)) FROM [20090921_round].t_round
This makes 59.
Note that we have 10 values in our table and the difference between 59 and 50 is only 9.
This is because there is no difference between the
CEILING for the integer numbers (and only for them). If the number is not integer, the difference will be exactly 1.
And it's easy to see that indeed there is exactly 1 integer number in the table. All remaining 9 values are fractional and therefore contribute to the difference.
So we see that there are 9 numbers that affect the final sum being rounded differently. It is also easy to see that each value contribution to the difference amounts exactly to 1.
Now, the difference between
SUM(value) is 2.
So if we take 2 of the fractional values and
CEILING them, then take another 7 and
FLOOR them, we will get exactly the original sum.
But which values should we floor and which should we ceil?
Of course we should order the values by their fractional parts in descending order (so that those with greater fractional parts would go first) and
CEILING the top ones first. That will implicitly make the ceiling threshold I mentioned earlier and
CEILING will be only applied only to the values exceeding the threshold (or, probably, equal to that).
To pick the right values we should utilize analytical function
First, we calculate the difference between
SUM(FLOOR(VALUE)). This gives us the number of values we should ceil rather than floor to get the correct sum. Let's denote this difference by
N. Then we take first
N values using
ROW_NUMBER() and apply
CEILING to them, while applying
FLOOR to other.
Here's how do we do it in SQL Server:
SELECT value, CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= SUM(value) OVER() - SUM(FLOOR(value)) OVER() THEN CEILING(value) ELSE FLOOR(value) END AS rvalue FROM [20090921_round].t_round
We see that
CEILING is only applied to the first two values with greatest fractional parts. All other values are rounded down to the nearest smaller integer.
Now, let's check the sum of the values:
SELECT SUM(rvalue) AS rsum FROM ( SELECT CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= SUM(value) OVER() - SUM(FLOOR(value)) OVER() THEN CEILING(value) ELSE FLOOR(value) END AS rvalue FROM [20090921_round].t_round ) q
Just as in the original table.