EXPLAIN EXTENDED

How to create fast database queries

Counting concurrent sessions

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Steve asks:

I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I'm trying to write daily performance reporting for the site).

Here's a simplified view of my base table design:

CREATE TABLE CONNECTIONS
(
USERID VARCHAR2(30),
HANDLE VARCHAR2(40),
PROCESS_DATE DATE,
COMMAND NUMBER(6,0),
COUNT NUMBER(10,0),
ELAPSED_TIME NUMBER(10,0),
NUM_OF_RECS NUMBER(10,0),
COMMAND_TIMESTAMP TIMESTAMP (6)
)

The question is: at there moment each transaction started, how many other transactions were active?

At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between command_timestamp and command_timestamp + elapsed_time / 86400000.

Database B-Tree indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.

But these is a more simple solution.

Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.

So we just can build a table of events: starts and ends of the transactions, ordered chronologically. Each start would be denoted with a +1, and each end with a -1. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.

This can be easily done merely by calculating the partial sum of these +1's and -1's, which is an easy task for Oracle's analytic functions.

Let's create a sample table. I'll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:

Table creation details

For the sake of brevity, elapsed_time is expressed in days, not in milliseconds.

The table is indexed with two indexes: one on command_timestamp and another one on command_timestamp + elapsed_time.

We will calculate the average of the concurrent queries for the transactions started on Jan 1st, 2010.

To do this, we will first need to select all transactions that overlap this date. This includes all transactions whose command_timestamp or command_timestamp + elapsed_time is within this date. Both these conditions are sargable, but not at the same time. An OR clause here would be inefficient, since no single index can be used to filter on both conditions.

To work around this, we will just split the query in two parts. The first part will select all transactions that started on this date, the second part will select all transactions that ended on this date but started earlier. These two sets do not intersect, and their sum gives all transaction we are interested at. So we can just merge these two sets using UNION ALL:

WITH    current_sessions AS
(
SELECT  *
FROM    t_session
WHERE   command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
        )
SELECT  COUNT(*), SUM(LENGTH(stuffing))
FROM    current_sessions

COUNT(*) SUM(LENGTH(STUFFING))
1444 288800
1 row fetched in 0.0001s (0.0024s)
SELECT STATEMENT 
 SORT AGGREGATE
  VIEW 
   UNION-ALL PARTITION
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END

Each part of the query used its own index and no extra effort is needed to get rid of the duplicates, so the query completes in 2 ms.

Now, we should need to calculate the number of concurrent transactions.

To do this, we will duplicate the recordset we got on the previous step, adding an extra field, event.

The first copy, with event = 1 will hold the beginnings of the transactions; the second copy with event = -1 will hold the ends. This will give us the equal number of the records with the opposite signs, so ultimately they will add up to a zero (since all transactions get into the table only after they complete). Each transaction will therefore be split into two records.

These records will then be sorted by event_date (which corresponds to command_timestamp or command_timestamp + elapsed_time respectively, depending on the set the record belongs to).

Then, the partial sum will be calculated for each record, using Oracle's SUM() OVER () analytic function.

Since the events are ordered by their date, the value of the partial sum will hold the number of transactions open (since their opening record had already been selected and added a +1 to the sum), but not yet closed (since their closing record had not yet been selected). We don't know which transactions exactly contributed to the sum, but we are not interested in this information. All we know (and all we need to know) is the difference between the numbers of open and closed transactions.

On this step we have the partial sum for each record, but we need to get rid of the extra records. We are interested in the number of concurrent transactions at the moments each transaction began, so we will just filter the resultset so that it only selects the records with event = 1, since they correspond to the beginning of the transactions.

Finally, we just need to subtract 1 from the partial sums. This is because the transaction records contribute to the partial sums too, and we need to count the number of concurrent transactions, not the total transactions.

And, finally, here's the query:

WITH    current_sessions AS
(
SELECT  *
FROM    t_session
WHERE   command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
        UNION ALL
        SELECT  *
        FROM    t_session
        WHERE   command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy')
AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1
                AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy')
        )
SELECT  AVG(concurrent), SUM(LENGTH(stuffing))
FROM    (
        SELECT  q.*,
                SUM(event) OVER (ORDER BY event_date, event DESC) - 1 AS concurrent
        FROM    (
                SELECT  1 AS event,
                        command_timestamp AS event_date,
                        cb.*
                FROM    current_sessions cb
                UNION ALL
                SELECT  -1 AS event,
                        command_timestamp + elapsed_time AS event_date,
                        ce.*
                FROM    current_sessions ce
                ) q
        )
WHERE   event = 1

AVG(CONCURRENT) SUM(LENGTH(STUFFING))
1,51662049861495844875346260387811634349 288800
1 row fetched in 0.0001s (0.0691s)
SELECT STATEMENT 
 TEMP TABLE TRANSFORMATION 
  LOAD AS SELECT 
   UNION-ALL 
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP
    TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION
     INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
  SORT AGGREGATE
   VIEW 
    WINDOW SORT
     VIEW 
      UNION-ALL 
       VIEW 
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
       VIEW 
        TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8

The query completes in only 69 ms.

Hope that helps.


I'm always glad to answer the questions regarding database queries.

Ask me a question

Written by Quassnoi

January 25th, 2010 at 11:00 pm

Posted in Oracle

Leave a Reply