EXPLAIN EXTENDED

How to create fast database queries

Cumulative values

Comments enabled. I *really* need your comment

From Stack Overflow:

There is a table that stores signal statuses that come from different devices.

  • SS1 and SS2 signals are inserted to table in random times
  • If either of SS1 and SS2 signal statuses is up, then the resulting signal should be up
  • If both SS1 and SS2 signal statuses are down, then resulting signal should be down

I want to prepare a query that shows the result signal status changes according to SS1 and SS2 signals

Each record deals with only one signal type here: either SS1 or SS2. To obtain the signal statuses we should query the cumulative values of previous records.

If a record describes a change in SS2, we should query for the most recent change to SS1 that had been recorded so far to obtain the SS1's current status.

In systems other than Oracle, the previous value of a signal could be easily queried using subselects with TOP / LIMIT clauses. But Oracle does not support correlated queries nested more than one level deep, and limiting a subquery result to a single record (which is required by a subquery) requires it (ORDER BY should be nested). This makes constructing such a subquery in Oracle quite a pain.

However, in Oracle, these things can be queries using analytics functions much more efficiently.

Let's create a sample table:

Table creation details

This table contains 1,000,000 records filled with random states of random signals once per second.

Each record describes a state of a certain signal at a given moment of time. To find out the state of another signal at that moment of time we need to know the state held by the latest record for that signal.

To find out that state we can employ analytical function, LAST_VALUE.

With a default RANGE (that is BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), this function just gives the value of the expression held by the current row, and, therefore, is quite useless: it gives the same result as a plain expression would give, without any functions.

If col is unique and defines the order of the rows, then LAST_VALUE(expression) OVER (ORDER BY col) is just a quite expensive synonym for expression.

However, this function's behavior can be changed by adding IGNORE NULLS clause. This clause makes the function return the last value so far which is not a NULL.

Now, returning the cumulative value of any signal becomes quite simple. We should just just make two expressions which substitute NULL instead of the signal status for the wrong signals. The LAST_VALUE (IGNORE NULLS) over these expressions will persist until rewritten by the new states of their corresponding signals.

Let's check it:

SELECT  *
FROM    (
        SELECT  /*+ FIRST_ROWS */
                s.*,
                DECODE(signal, 'SS1', status, NULL) AS exp1,
                DECODE(signal, 'SS2', status, NULL) AS exp2,
                LAST_VALUE(DECODE(signal, 'SS1', status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss1,
                LAST_VALUE(DECODE(signal, 'SS2', status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss2
        FROM    t_signal s
        ORDER BY
                ts, id
        ) s2
WHERE   rownum <= 15
ID SIGNAL STATUS TS EXP1 EXP2 SS1 SS2
1000000 SS2 1 08.01.2010 10:13:20 1 1
999999 SS2 0 08.01.2010 10:13:21 0 0
999998 SS1 0 08.01.2010 10:13:22 0 0 0
999997 SS2 1 08.01.2010 10:13:23 1 0 1
999996 SS1 1 08.01.2010 10:13:24 1 1 1
999995 SS2 1 08.01.2010 10:13:25 1 1 1
999994 SS1 0 08.01.2010 10:13:26 0 0 1
999993 SS1 1 08.01.2010 10:13:27 1 1 1
999992 SS1 0 08.01.2010 10:13:28 0 0 1
999991 SS2 1 08.01.2010 10:13:29 1 0 1
999990 SS1 0 08.01.2010 10:13:30 0 0 1
999989 SS2 0 08.01.2010 10:13:31 0 0 0
999988 SS1 0 08.01.2010 10:13:32 0 0 0
999987 SS2 0 08.01.2010 10:13:33 0 0 0
999986 SS1 0 08.01.2010 10:13:34 0 0 0
15 rows fetched in 0.0012s (0.0007s)
SELECT STATEMENT 
 COUNT STOPKEY
  VIEW 
   WINDOW NOSORT
    TABLE ACCESS BY INDEX ROWID, 20100120_signal.T_SIGNAL
     INDEX FULL SCAN, 20100120_signal.IX_SIGNAL_TS_ID

In the resultset above, EXP1 and EXP2 show the changes in the signal states of SS1 and SS2, with NULL's if the current record does not describe a change in the appropriate signals.

LAST_VALUE (IGNORE NULLS) over these expressions show their cumulative values. These values can be used to calculate the resulting signal state.

Note that using ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW made Oracle to use WINDOW NOSORT and this query is instant, since the LAST_VALUE can be buffered.

To calculate the resulting signal, we should just return the GREATEST of ss1 and ss2 (returned by LAST_VALUE):

SELECT  SUM(GREATEST(ss1, ss2))
FROM    (
        SELECT  s.*,
                DECODE(signal, 'SS1', status, NULL) AS exp1,
                DECODE(signal, 'SS2', status, NULL) AS exp2,
                LAST_VALUE(DECODE(signal, 'SS1', status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss1,
                LAST_VALUE(DECODE(signal, 'SS2', status, NULL) IGNORE NULLS) OVER (ORDER BY ts, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ss2
        FROM    t_signal s
        ORDER BY
                ts, id
        ) s2
SUM(GREATEST(SS1,SS2))
750450
1 row fetched in 0.0003s (3.0781s)
SELECT STATEMENT 
 SORT AGGREGATE
  VIEW 
   WINDOW SORT
    TABLE ACCESS FULL, 20100120_signal.T_SIGNAL

For the sake of brevity, we select the SUM of the resulting signal states which gives us the number of records with resulting signal up. As it should be, this number roughly amounts to 75% of the total number of records.

The query uses a single sort (which in case of the whole table is faster than traversing the index), buffers the results and completes in 3 seconds which is almost as fast as a plain query with ORDER BY over the same dataset would complete.

Written by Quassnoi

January 20th, 2010 at 11:00 pm

Posted in Oracle

Leave a Reply