Answering questions asked on the site.
I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I'm trying to write daily performance reporting for the site).
Here's a simplified view of my base table design:CREATE TABLE CONNECTIONS ( USERID VARCHAR2(30), HANDLE VARCHAR2(40), PROCESS_DATE DATE, COMMAND NUMBER(6,0), COUNT NUMBER(10,0), ELAPSED_TIME NUMBER(10,0), NUM_OF_RECS NUMBER(10,0), COMMAND_TIMESTAMP TIMESTAMP (6) )
The question is: at there moment each transaction started, how many other transactions were active?
At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between
command_timestamp + elapsed_time / 86400000.
Database B-Tree indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.
But these is a more simple solution.
Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.
So we just can build a table of
events: starts and ends of the transactions, ordered chronologically. Each
start would be denoted with a +1, and each
end with a -1. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.
This can be easily done merely by calculating the partial sum of these +1's and -1's, which is an easy task for Oracle's analytic functions.
Let's create a sample table. I'll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:
Table creation details
BEGIN DBMS_RANDOM.seed(20100125); END; / CREATE TABLE t_session ( id NOT NULL PRIMARY KEY, command_timestamp NOT NULL, elapsed_time NOT NULL, stuffing NOT NULL ) AS SELECT level, CAST(TO_DATE('25.01.2010', 'dd.mm.yyyy') - level / 1440 - DBMS_RANDOM.value / 2880 AS TIMESTAMP(6)), CAST(DBMS_RANDOM.value / 360 AS NUMBER(7, 5)), CAST(RPAD('*', 200, '*') AS VARCHAR2(200)) FROM dual CONNECT BY level <= 1000000 / CREATE INDEX ix_session_commandtimestamp ON t_session (command_timestamp) / CREATE INDEX ix_session_end ON t_session (command_timestamp + elapsed_time) /
For the sake of brevity,
elapsed_time is expressed in days, not in milliseconds.
The table is indexed with two indexes: one on
command_timestamp and another one on
command_timestamp + elapsed_time.
We will calculate the average of the concurrent queries for the transactions started on Jan 1st, 2010.
To do this, we will first need to select all transactions that overlap this date. This includes all transactions whose
command_timestamp + elapsed_time is within this date. Both these conditions are sargable, but not at the same time. An
OR clause here would be inefficient, since no single index can be used to filter on both conditions.
To work around this, we will just split the query in two parts. The first part will select all transactions that started on this date, the second part will select all transactions that ended on this date but started earlier. These two sets do not intersect, and their sum gives all transaction we are interested at. So we can just merge these two sets using
WITH current_sessions AS ( SELECT * FROM t_session WHERE command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy') AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1 UNION ALL SELECT * FROM t_session WHERE command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy') AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1 AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') ) SELECT COUNT(*), SUM(LENGTH(stuffing)) FROM current_sessions
|1 row fetched in 0.0001s (0.0024s)|
SELECT STATEMENT SORT AGGREGATE VIEW UNION-ALL PARTITION TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END
Each part of the query used its own index and no extra effort is needed to get rid of the duplicates, so the query completes in 2 ms.
Now, we should need to calculate the number of concurrent transactions.
To do this, we will duplicate the recordset we got on the previous step, adding an extra field,
The first copy, with
event = 1 will hold the beginnings of the transactions; the second copy with
event = -1 will hold the ends. This will give us the equal number of the records with the opposite signs, so ultimately they will add up to a zero (since all transactions get into the table only after they complete). Each transaction will therefore be split into two records.
These records will then be sorted by
event_date (which corresponds to
command_timestamp + elapsed_time respectively, depending on the set the record belongs to).
Then, the partial sum will be calculated for each record, using Oracle's
SUM() OVER () analytic function.
Since the events are ordered by their date, the value of the partial sum will hold the number of transactions open (since their opening record had already been selected and added a +1 to the sum), but not yet closed (since their closing record had not yet been selected). We don't know which transactions exactly contributed to the sum, but we are not interested in this information. All we know (and all we need to know) is the difference between the numbers of open and closed transactions.
On this step we have the partial sum for each record, but we need to get rid of the extra records. We are interested in the number of concurrent transactions at the moments each transaction began, so we will just filter the resultset so that it only selects the records with
event = 1, since they correspond to the beginning of the transactions.
Finally, we just need to subtract 1 from the partial sums. This is because the transaction records contribute to the partial sums too, and we need to count the number of concurrent transactions, not the total transactions.
And, finally, here's the query:
WITH current_sessions AS ( SELECT * FROM t_session WHERE command_timestamp >= TO_DATE('01.01.2010', 'dd.mm.yyyy') AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1 UNION ALL SELECT * FROM t_session WHERE command_timestamp + elapsed_time >= TO_DATE('01.01.2010', 'dd.mm.yyyy') AND command_timestamp + elapsed_time < TO_DATE('01.01.2010', 'dd.mm.yyyy') + 1 AND command_timestamp < TO_DATE('01.01.2010', 'dd.mm.yyyy') ) SELECT AVG(concurrent), SUM(LENGTH(stuffing)) FROM ( SELECT q.*, SUM(event) OVER (ORDER BY event_date, event DESC) - 1 AS concurrent FROM ( SELECT 1 AS event, command_timestamp AS event_date, cb.* FROM current_sessions cb UNION ALL SELECT -1 AS event, command_timestamp + elapsed_time AS event_date, ce.* FROM current_sessions ce ) q ) WHERE event = 1
|1 row fetched in 0.0001s (0.0691s)|
SELECT STATEMENT TEMP TABLE TRANSFORMATION LOAD AS SELECT UNION-ALL TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_COMMANDTIMESTAMP TABLE ACCESS BY INDEX ROWID, 20100125_concurrent.T_SESSION INDEX RANGE SCAN, 20100125_concurrent.IX_SESSION_END SORT AGGREGATE VIEW WINDOW SORT VIEW UNION-ALL VIEW TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8 VIEW TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6836_1825AC8
The query completes in only 69 ms.
Hope that helps.
I'm always glad to answer the questions regarding database queries.