EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Things SQL needs: determining range cardinality. You're welcome to read and comment on it.

Oracle: nested SUM

From Stack Overflow:

Suppose I have this table

A B C D

Datatypes are not important.

I want to do this:

SELECT  a AS a1, b AS b1, c AS c1,
        (
        SELECT  SUM(d)
        FROM    source
        WHERE   a = a1
                AND b = b1
        ) AS total
FROM    source
GROUP BY
        a, b, c

, but I can’t find a way (SQL Developer keeps complaining with FROM clause not found.)

Is there a way? Is it possible?

This is of course possible if we just alias the query and prepend the alias to the field:

SELECT  a, b, c,
        (
        SELECT  SUM(d)
        FROM    source si
        WHERE   si.a = so.a
                AND si.b = so.b
        CONNECT BY
                16 >= level
        )
FROM    source so
GROUP BY
        a, b, c

This works well on this sample set of data:

WITH    source AS
        (
        SELECT  FLOOR(MOD(level - 1, 8) / 4) + 1 AS a,
                FLOOR(MOD(level - 1, 4) / 2) + 1 AS b,
                MOD(level - 1, 2) + 1 AS c,
                level AS d
        FROM    dual
        CONNECT BY
                16 >= level
        )
SELECT  a, b, c,
        (
        SELECT  SUM(d)
        FROM    source si
        WHERE   si.a = so.a
                AND si.b = so.b
        )
FROM    source so
GROUP BY
        a, b, c

 
View query details

, but it needs to reevaluate the subquery for each group.

In Oracle, there is a better way to do this query: nesting an aggregate SUM inside an analytical SUM.

What this query does is in fact the following:

  • Calculate the groupwise SUM(d) for each (a, b)
  • Retrieve the distinct values of a, b, c
  • For each (a, b, c), return the SUM(d) for the corresponding (a, b)

This means that total will be the same for each (a, b), thous the values of c may differ.

However, the SUM is associative, that is sum of groupwise sums is the same as the sum of separate values.

To calculate the SUM(d) for a given (a, b) we can calculate the partial sums for (a, b, c) and add them together. Then we need return this value for each (a, b) (which can be multiple).

To return an aggregate value along with each of the records that contribute to it, the analytical functions are used (those with OVER clause).

It is widely known that analytical functions, as well as aggregate functions, cannot be nested.

However, one can nest an aggregate function within an analytical one. This is possible because aggregate functions work over the source-level values before GROUP BY, while analytical ones work with SELECT-level expressions (after the GROUP BY)

Here’s the query to do this:

WITH    source AS
        (
        SELECT  FLOOR(MOD(level - 1, 8) / 4) + 1 AS a,
                FLOOR(MOD(level - 1, 4) / 2) + 1 AS b,
                MOD(level - 1, 2) + 1 AS c,
                level AS d
        FROM    dual
        CONNECT BY
                level <= 16
        )
SELECT  a, b, c, SUM(SUM(d)) OVER (PARTITION BY a, b)
FROM    source
GROUP BY
        a, b, c

View query details

This query returns the same results.

The innermost SUMs are calculated (a, b, c)-wise (like in GROUP BY), while the outermost ones are calculated (a, b)-wise (like in PARTITION BY).

Since ordering by (a, b, c) implies ordering by (a, b), both these groupings are performed with a single SORT.

Written by Quassnoi

October 8th, 2009 at 11:00 pm

Posted in Oracle

Comments are closed.