## Archive for October 13th, 2009

## Running root mean square: SQL Server

Comments enabled. I *really* need your comment

From **Stack Overflow**:

Lets say I have the following in a

SQL Servertable:

Value 1 3 2 6 3 I need to get the differences of each of these numbers (in order), then square them, sum them, divide by number of values and take the square root.

This value is called root mean square. It is widely used in statistics.

However, we should calculate it for the differences between the values, not for the values themselves. In **Oracle**, the differences between rows could be calculated easilty using `LAG`

, an analytical function that returns a previous row. Unfortunately, **SQL Server** lacks this function.

It could be emulated using a subquery or `CROSS APPLY`

along with `ORDER BY`

and `TOP`

, but this makes the query less elegant and less efficient for large tables (since using `TOP`

in subqueries and `CROSS APPLY`

forces `NESTED LOOPS`

).

It is better to assign the row number to each row (using `ROW_NUMBER`

) and then just make an equijoin between two consecutive rows. This looks much better and the optimizer is free to choose any join method, including `Hash Join`

which is most efficient when large percent of the table rows participates in a join, or a `Merge Join`

which will be also efficient, given that `ROW_NUMBER()`

tends to output the ordered resultsets and **SQL Server** can utilize this fact.

Here's the query:

WITH nums AS ( SELECT value, ROW_NUMBER() OVER (ORDER BY orderer) AS rn FROM source ) SELECT SQRT(AVG(POWER(np.value - nn.value, 2))) FROM nums np JOIN nums nn ON nn.rn = np.rn + 1

And let's try it on the sample data:

WITH source (orderer, value) AS ( SELECT 10, 1 UNION ALL SELECT 20, 3 UNION ALL SELECT 30, 2 UNION ALL SELECT 40, 6 UNION ALL SELECT 50, 3 ), nums AS ( SELECT value, ROW_NUMBER() OVER (ORDER BY orderer) AS rn FROM source ) SELECT SQRT(AVG(POWER(np.value - nn.value, 2))) FROM nums np JOIN nums nn ON nn.rn = np.rn + 1

2.6457513110645907 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1 row fetched in 0.0001s (0.0007s) |

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

|--Compute Scalar(DEFINE:([Expr1027]=sqrt(CONVERT_IMPLICIT(float(53),[Expr1026],0)))) |--Compute Scalar(DEFINE:([Expr1026]=CASE WHEN [Expr1038]=(0) THEN NULL ELSE [Expr1039]/CONVERT_IMPLICIT(int,[Expr1038],0) END)) |--Stream Aggregate(DEFINE:([Expr1038]=COUNT_BIG(power([Union1011]-[Union1024],(2.000000000000000e+000))), [Expr1039]=SUM(power([Union1011]-[Union1024],(2.000000000000000e+000))))) |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1025])=([Expr1033]), RESIDUAL:([Expr1025]=([Expr1012]+(1)))) |--Sequence Project(DEFINE:([Expr1025]=row_number)) | |--Compute Scalar(DEFINE:([Expr1035]=(1))) | |--Segment | |--Sort(ORDER BY:([Union1023] ASC)) | |--Constant Scan(VALUES:(((10),(1)),((20),(3)),((30),(2)),((40),(6)),((50),(3)))) |--Compute Scalar(DEFINE:([Expr1033]=[Expr1012]+(1))) |--Sequence Project(DEFINE:([Expr1012]=row_number)) |--Compute Scalar(DEFINE:([Expr1037]=(1))) |--Segment |--Sort(ORDER BY:([Union1010] ASC)) |--Constant Scan(VALUES:(((10),(1)),((20),(3)),((30),(2)),((40),(6)),((50),(3))))

The query works, uses the `Merge Join`

and gives a correct result.