EXPLAIN EXTENDED

How to create fast database queries

Archive for October 13th, 2009

Running root mean square: SQL Server

Comments enabled. I *really* need your comment

From Stack Overflow:

Lets say I have the following in a SQL Server table:

Value
1
3
2
6
3

I need to get the differences of each of these numbers (in order), then square them, sum them, divide by number of values and take the square root.

This value is called root mean square. It is widely used in statistics.

However, we should calculate it for the differences between the values, not for the values themselves. In Oracle, the differences between rows could be calculated easilty using LAG, an analytical function that returns a previous row. Unfortunately, SQL Server lacks this function.

It could be emulated using a subquery or CROSS APPLY along with ORDER BY and TOP, but this makes the query less elegant and less efficient for large tables (since using TOP in subqueries and CROSS APPLY forces NESTED LOOPS).

It is better to assign the row number to each row (using ROW_NUMBER) and then just make an equijoin between two consecutive rows. This looks much better and the optimizer is free to choose any join method, including Hash Join which is most efficient when large percent of the table rows participates in a join, or a Merge Join which will be also efficient, given that ROW_NUMBER() tends to output the ordered resultsets and SQL Server can utilize this fact.

Here's the query:

WITH    nums AS
        (
        SELECT  value, ROW_NUMBER() OVER (ORDER BY orderer) AS rn
        FROM    source
        ) 
SELECT  SQRT(AVG(POWER(np.value - nn.value, 2)))
FROM    nums np
JOIN    nums nn
ON      nn.rn = np.rn + 1

And let's try it on the sample data:

WITH    source (orderer, value) AS
        (
        SELECT  10, 1
        UNION ALL
        SELECT  20, 3
        UNION ALL
        SELECT  30, 2
        UNION ALL
        SELECT  40, 6
        UNION ALL
        SELECT  50, 3
        ),
        nums AS
        (
        SELECT  value, ROW_NUMBER() OVER (ORDER BY orderer) AS rn
        FROM    source
        ) 
SELECT  SQRT(AVG(POWER(np.value - nn.value, 2)))
FROM    nums np
JOIN    nums nn
ON      nn.rn = np.rn + 1

2.6457513110645907
1 row fetched in 0.0001s (0.0007s)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms. 
  |--Compute Scalar(DEFINE:([Expr1027]=sqrt(CONVERT_IMPLICIT(float(53),[Expr1026],0))))
       |--Compute Scalar(DEFINE:([Expr1026]=CASE WHEN [Expr1038]=(0) THEN NULL ELSE [Expr1039]/CONVERT_IMPLICIT(int,[Expr1038],0) END))
            |--Stream Aggregate(DEFINE:([Expr1038]=COUNT_BIG(power([Union1011]-[Union1024],(2.000000000000000e+000))), [Expr1039]=SUM(power([Union1011]-[Union1024],(2.000000000000000e+000)))))
                 |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1025])=([Expr1033]), RESIDUAL:([Expr1025]=([Expr1012]+(1))))
                      |--Sequence Project(DEFINE:([Expr1025]=row_number))
                      |    |--Compute Scalar(DEFINE:([Expr1035]=(1)))
                      |         |--Segment
                      |              |--Sort(ORDER BY:([Union1023] ASC))
                      |                   |--Constant Scan(VALUES:(((10),(1)),((20),(3)),((30),(2)),((40),(6)),((50),(3))))
                      |--Compute Scalar(DEFINE:([Expr1033]=[Expr1012]+(1)))
                           |--Sequence Project(DEFINE:([Expr1012]=row_number))
                                |--Compute Scalar(DEFINE:([Expr1037]=(1)))
                                     |--Segment
                                          |--Sort(ORDER BY:([Union1010] ASC))
                                               |--Constant Scan(VALUES:(((10),(1)),((20),(3)),((30),(2)),((40),(6)),((50),(3))))

The query works, uses the Merge Join and gives a correct result.

Written by Quassnoi

October 13th, 2009 at 11:00 pm

Posted in SQL Server