Archive for the ‘Oracle’ Category
Happy New Year: drawing clock in SQL
Some say SQL is not good at graphics.
Well, they have some point. Database engines lack scanner drivers, there is no easy way to do sepia, and magic wand, let's be honest, is just poor.
However, you can make some New Year paintings with SQL.
Let's make a New Year clock showing 12 o'clock in Oracle.
Counting concurrent sessions
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Steve asks:
I am trying to query a log table for a web service application and determine how many concurrent sessions are in progress at each moment a transaction is executed, based on a start date and an elapsed time for each command executed through the web service. (These metrics are logged after the fact, I'm trying to write daily performance reporting for the site).
Here's a simplified view of my base table design:
CREATE TABLE CONNECTIONS ( USERID VARCHAR2(30), HANDLE VARCHAR2(40), PROCESS_DATE DATE, COMMAND NUMBER(6,0), COUNT NUMBER(10,0), ELAPSED_TIME NUMBER(10,0), NUM_OF_RECS NUMBER(10,0), COMMAND_TIMESTAMP TIMESTAMP (6) )
The question is: at there moment each transaction started, how many other transactions were active?
At each given moment, there is some number of active transaction. A transaction is active if the transaction begins before that moment and ends after it. This means that the moment should fall between command_timestamp
and command_timestamp + elapsed_time / 86400000
.
Database B-Tree indexes are not very good in queries that involve searching for a constant between two columns, so a self-join on the condition described above would be possible but not very efficient.
But these is a more simple solution.
Whenever a transaction starts, it increments the count of the open transactions. Whenever the transaction ends, it decrements it.
So we just can build a table of events
: starts and ends of the transactions, ordered chronologically. Each start
would be denoted with a +1, and each end
with a -1. Then we should just calculate the number of the transactions open so far and subtract the number of the transactions closed.
This can be easily done merely by calculating the partial sum of these +1's and -1's, which is an easy task for Oracle's analytic functions.
Let's create a sample table. I'll put only the relevant columns there and add a stuffing column that would emulate actual payload, to measure performance:
Read the rest of this entry »
Cumulative values
Comments enabled. I *really* need your comment
From Stack Overflow:
There is a table that stores signal statuses that come from different devices.
- SS1 and SS2 signals are inserted to table in random times
- If either of SS1 and SS2 signal statuses is up, then the resulting signal should be up
- If both SS1 and SS2 signal statuses are down, then resulting signal should be down
I want to prepare a query that shows the result signal status changes according to SS1 and SS2 signals
Each record deals with only one signal type here: either SS1 or SS2. To obtain the signal statuses we should query the cumulative values of previous records.
If a record describes a change in SS2, we should query for the most recent change to SS1 that had been recorded so far to obtain the SS1's current status.
In systems other than Oracle, the previous value of a signal could be easily queried using subselects with TOP
/ LIMIT
clauses. But Oracle does not support correlated queries nested more than one level deep, and limiting a subquery result to a single record (which is required by a subquery) requires it (ORDER BY should be nested). This makes constructing such a subquery in Oracle quite a pain.
However, in Oracle, these things can be queries using analytics functions much more efficiently.
Let's create a sample table:
Read the rest of this entry »
Oracle: joining timestamps and time intervals
From Stack Overflow:
I am making an inner join of two tables.
I have many time intervals represented by
intervals
and i want to get measure data frommeasures
only within those intervals. The intervals do not overlap.Here are the table layouts:
intervals entry_time exit_time
measures time measure There are 1295 records in
intervals
and about a million records inmeasures
. The intervals do not overlap.The result I want to get is a table with in the first column the measure, then the time the measure has been done, the begin/end time of the considered interval (it would be repeated for row with a time within the considered range)
How can I make this faster?
Straight query
The query to get the results would look like this:
SELECT measures.measure as measure, measures.time as time, intervals.entry_time as entry_time, intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ASC
This looks quite simple. However, choosing the execution plan can be somewhat tricky task for this query.
The table layout gives us no hint on how many rows will be returned from the join. If all intervals begin before the time of the first measure and end after the time of the last measure, then every combination of rows will be returned from both tables and the resulting recordset will in fact be a cartesian product of two tables. If the intervals are short and sparse, few or even no rows can be returned from measures.
However, we know that the intervals do not overlap. This means that each measure may belong to at most one interval and the number of records in the final resultset will be no more than the number of records in measures
.
Since the condition we join here is a pair of inequalities, only two methods can be used by the engine to perform the join, that is NESTED LOOPS
and MERGE JOIN
. HASH JOIN
which is most efficient on large tables requires an equijoin and cannot be used for this query.
MERGE JOIN
sorts both resultsets on the columns it joins on and gradually advances the internal pointer on both sorted row sources so that the values of the columns being joined on in both tables always match. In case of an equijoin, the engine returns only the records holding the current value of the pointer; in case of a join on inequality condition, the engine returns all records greater (or less) than the current value from the corresponding table.
MERGE JOIN
, however, can only satisfy a single inequality condition while we here have two of them on two different columns. The engine should split this condition into the pair of inequalities: the first one will be used by the join; the second one will be just filtered. The query essentially turns into this one:
SELECT measures.measure as measure, measures.time as time, intervals.entry_time as entry_time, intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time >= intervals.entry_time WHERE measures.time <= intervals.exit_time ORDER BY time ASC
Here, the MERGE JOIN
will be using the predicate in the ON
clause and the filtering will use the predicate in the WHERE
clause. The predicates are symmetrical and can easily be swapped.
However, the join will have to return all records to the filtering code. And with an inequality condition like the one we see above, there will be lots of records. If we take a normal situation: the interval bounds are more or less in accordance with the dates of the first and last measure and the intervals are distributed evenly, then the MERGE JOIN
will return (1265 × 1,000,000) / 2 ≈ 600,000,000
records, each to be filtered on the next step.
From performance's point of view, this is hardly different from a cartesian join: in fact, it needs to process only as few as a half of the rows. These rows, however, need to be sorted beforehand (or taken from the index which is slow to traverse in its sort order), so this can actually even be slower than a cartesian join which does not require the rows to be in any certain order and can use an INDEX FAST FULL SCAN
or a TABLE SCAN
.
This means the only efficient way to run this query is using NESTED LOOPS
. To benefit from this method, we should make an index on measures.time
and convince Oracle to use it.
Let's create a sample table:
Read the rest of this entry »
Oracle: Selecting records holding group-wise maximum
It was quite a time since I posted my last article, and I have a bunch of unanswered questions that people asked me.
Sorry for not posting and answering for so long, had a few urgent things to do, and hope these breaks won't be so long in the future :)
Now, continuing the series on selecting records holding group-wise maximums:
How do I select the whole records, grouped on
grouper
and holding a group-wise maximum (or minimum) on other column?
Finally, Oracle.
We will create the very same table as we did for the previous systems:
Read the rest of this entry »
Oracle: nested SUM
From Stack Overflow:
Suppose I have this table
A B C D Datatypes are not important.
I want to do this:
SELECT a AS a1, b AS b1, c AS c1, ( SELECT SUM(d) FROM source WHERE a = a1 AND b = b1 ) AS total FROM source GROUP BY a, b, c, but I can't find a way (SQL Developer keeps complaining with
.)FROM
clause not foundIs there a way? Is it possible?
This is of course possible if we just alias the query and prepend the alias to the field:
SELECT a, b, c, ( SELECT SUM(d) FROM source si WHERE si.a = so.a AND si.b = so.b CONNECT BY 16 >= level ) FROM source so GROUP BY a, b, c
This works well on this sample set of data:
WITH source AS ( SELECT FLOOR(MOD(level - 1, 8) / 4) + 1 AS a, FLOOR(MOD(level - 1, 4) / 2) + 1 AS b, MOD(level - 1, 2) + 1 AS c, level AS d FROM dual CONNECT BY 16 >= level ) SELECT a, b, c, ( SELECT SUM(d) FROM source si WHERE si.a = so.a AND si.b = so.b ) FROM source so GROUP BY a, b, c
, but it needs to reevaluate the subquery for each group.
In Oracle, there is a better way to do this query: nesting an aggregate SUM
inside an analytical SUM
.
Read the rest of this entry »
IN vs. JOIN vs. EXISTS: Oracle
Answering questions asked on the site.
Jason asks:
I have a co-worker who swears that Oracle
IN
queries are slow and refuses to use them. For example:SELECT foo FROM bar WHERE bar.stuff IN ( SELECT stuff FROM asdf )Typically, this kind of performance advice strikes me as overgeneralized and my instinct is to ignore it. But I figure I'll give him the benefit of the doubt and ask about it.
So, in general is an
IN
query very expensive? I'm having trouble putting together many non-trivial queries to run onEXPLAIN PLAN
.
Does the IN
predicate always have inferior efficiency compared to it's counterparts, EXISTS
and JOIN
?
Let's check.
First of all, there are at least three ways to check each row in a table against a list of (possible duplicate) values and return each row at most once:
IN
predicate:SELECT foo FROM bar WHERE bar.stuff IN ( SELECT stuff FROM asdf )
EXISTS
predicate:SELECT foo FROM bar WHERE EXISTS ( SELECT NULL FROM asdf WHERE asdf.stuff = bar.stuff )
JOIN
/DISTINCT
:SELECT b.foo FROM ( SELECT DISTINCT stuff FROM asdf ) a JOIN bar b ON b.stuff = a.stuff
All these queries are semantically the same.
Common wisdom advices against using IN
predicates because it just doesn't look a right thing to do.
This is because less experienced SQL developers tend to translate the SQL statements into pseudocode just like they see it, which in this case looks something like this:
foreach ($bar as $bar_record) { foreach ($asdf as $asdf_record) { if ($bar_record->stuff == $asdf_record->stuff) output ($bar_record); break; } }
This just radiates inefficiency, since the inner rowset should be iterated for each row returned.
EXISTS
looks more nice since it at least gives a hint of possibility to use an index. However, it still looks like the same nested loops.
Finally, the JOIN
looks the most promising here since everybody knows joins are optimized (though few know how exactly). But an IN
? No thanks, they say, it will execute once for every row, it's slow, it's bad, it's inefficient!
People, come on. Oracle developers have thought it over ages ago and guess what: they turned out to be smart enough to implement an efficient algorithm for an IN
construct.
Now, let's make two sample tables and see how it's done:
Read the rest of this entry »
Adjacency list vs. nested sets: Oracle
Comments enabled. I *really* need your comment
Continuing the series:
What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model?
For detailed explanations of the terms, see the first article in the series:
Today is Oracle time.
Adjacent sets require recursion, and Oracle has a native way to implement recursion in SQL statements. To do this it uses a special clause, CONNECT BY
.
Unlike other systems, Oracle always supported recursion in SQL statements since version 2 (this is the name the first version of Oracle was released under).
This was done because by that time there already were several hierarchical databases on the market which are quite good for storing hierarchical data. If not for this clause, transition from a hierarchical DBMS to the relational DBMS would be too hard.
Oracle's way to query self-relations is somewhat different from recursive CTE's that other systems use:
- Recursive CTEs can use an arbitrary set in recursive steps. The recursion base and recursion stack are not visible in further steps. The recursive operation is a set operation (usually a
JOIN
) on the recursion parameter (a set) and the result of the previous operation: CONNECT BY
queries can only use the value of a single row in recursive steps.The recursion base and recursion stack are visible, the values outside the recursion stack are not.
The recursive operation is a set operation on each row of the current row and the base rowset.
Unlike recursive CTE,
CONNECT BY
operates row-wise, that is each row of the result spans its own recursion branch with its own stack:
The main difference it that the CONNECT BY
operation cannot produce anything that was not in the original rowset, while a recursive CTE can.
However, CONNECT BY
has numerous benefits too: it's very fast, allows to track the hierarchy path easily (using a built-in funtion named SYS_CONNECT_BY_PATH
) and implicitly sorts the recordset in tree order (with additional clause ORDER SIBLINGS BY
to sort the siblings).
This method, of course, allows traversing adjacency lists easily.
To compare efficiency of adjacency lists and that of nested sets, let's create a sample table:
Read the rest of this entry »
Building alphabetical index
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Cora asks:
I want to build an alphabetical index over the articles in my database.
I need to show 5 entries in the index, like this:
A-D
,E-F
etc., so that each entry contains roughly equal number of articles.How do I do it in a single query?
This is in Oracle
This is a good task to demonstrate Oracle's analytic abilities.
To do this, we will create a sample table:
Read the rest of this entry »
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
Which method is best to select values present in one table but missing in another one?
This:
SELECT l.* FROM t_left l LEFT JOIN t_right r ON r.value = l.value WHERE r.value IS NULL, this:
SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right r )or this:
SELECT l.* FROM t_left l WHERE NOT EXISTS ( SELECT NULL FROM t_right r WHERE r.value = l.value )
Today, we will see how Oracle copes with these queries.
And to do this, we, of course, should create sample tables: