EXPLAIN EXTENDED

How to create fast database queries

Converting currencies

Comments enabled. I *really* need your comment

When developing a financial application, one needs to deal with currencies and convert from one currency to another.

There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.

A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.

Let's create sample tables and see how it may be done:

CREATE TABLE t_transaction (
        xac_id INT NOT NULL,
        xac_currency INT NOT NULL,
        xac_amount NUMBER(10, 2) NOT NULL,
        xac_date DATE NOT NULL
)
/
CREATE TABLE t_rate (
        rte_currency INT NOT NULL,
        rte_date DATE NOT NULL,
        rte_rate NUMBER(10, 4) NOT NULL
)
/
ALTER TABLE t_transaction
ADD CONSTRAINT pk_transaction_id
PRIMARY KEY (xac_id)
/
CREATE INDEX ix_transaction_date ON t_transaction (xac_date)
/
ALTER TABLE t_rate
ADD CONSTRAINT pk_rate_currency_date
PRIMARY KEY (rte_currency, rte_date)
/
BEGIN
        DBMS_RANDOM.seed(20090518);
END;
/
INSERT
INTO    t_transaction (xac_id, xac_currency, xac_amount, xac_date)
SELECT  level,
        DBMS_RANDOM.value() + 1,
        FLOOR(DBMS_RANDOM.value() * 1000000 + 1) / 100,
        TO_DATE('27.05.2009', 'dd.mm.yyyy') - (level / 1440)
FROM    dual
CONNECT BY
        level <= 1000000
/
INSERT  ALL
INTO    t_rate (rte_currency, rte_date, rte_rate)
VALUES  (
        1,
        TO_DATE('27.05.2009', 'dd.mm.yyyy') - lvl,
        25 + CAST((DBMS_RANDOM.value() - 0.5) * 30000 AS INT) / 10000
        )
INTO    t_rate (rte_currency, rte_date, rte_rate)
VALUES  (
        2,
        TO_DATE('27.05.2009', 'dd.mm.yyyy') - lvl,
        30 + CAST((DBMS_RANDOM.value() - 0.5) * 30000 AS INT) / 10000
        )
SELECT  level AS lvl
FROM    dual
WHERE   MOD(level, 7) NOT IN (3, 4)
CONNECT BY
        level <= 1000
/
COMMIT
/
BEGIN
        DBMS_STATS.gather_schema_stats('"20090527_rates"');
END;
/
&#91;/sourcecode&#93;

There are <strong>1,000,000</strong> transactions in <strong>2</strong> currencies, one transaction per minute, with amounts randomly distributed from <strong>0.01</strong> to <strong>9999.99</strong>:


SELECT  *
FROM    t_transaction
WHERE   rownum <= 10

&#91;/sourcecode&#93;

<div class="terminal">
<table class="terminal">
<tr>
<th>XAC_ID</th><th>XAC_CURRENCY</th><th>XAC_AMOUNT</th><th>XAC_DATE</th></tr>
<tr><td class="decimal">1</td><td class="decimal">2</td><td class="decimal">5527,24</td><td class="date">26.05.2009 23:59:00</td></tr>
<tr><td class="decimal">2</td><td class="decimal">1</td><td class="decimal">7066,16</td><td class="date">26.05.2009 23:58:00</td></tr>
<tr><td class="decimal">3</td><td class="decimal">2</td><td class="decimal">6849,27</td><td class="date">26.05.2009 23:57:00</td></tr>
<tr><td class="decimal">4</td><td class="decimal">1</td><td class="decimal">5654,12</td><td class="date">26.05.2009 23:56:00</td></tr>
<tr><td class="decimal">5</td><td class="decimal">2</td><td class="decimal">6748,84</td><td class="date">26.05.2009 23:55:00</td></tr>
<tr><td class="decimal">6</td><td class="decimal">2</td><td class="decimal">5184,32</td><td class="date">26.05.2009 23:54:00</td></tr>
<tr><td class="decimal">7</td><td class="decimal">1</td><td class="decimal">6832,27</td><td class="date">26.05.2009 23:53:00</td></tr>
<tr><td class="decimal">8</td><td class="decimal">2</td><td class="decimal">1572,45</td><td class="date">26.05.2009 23:52:00</td></tr>
<tr><td class="decimal">9</td><td class="decimal">2</td><td class="decimal">2287,1</td><td class="date">26.05.2009 23:51:00</td></tr>
<tr><td class="decimal">10</td><td class="decimal">1</td><td class="decimal">2916,78</td><td class="date">26.05.2009 23:50:00</td></tr>
</table>
</div>

For each currency, we have one rate per day, <em>except for <strong>Saturdays</strong> and <strong>Sundays</strong></em> (that's how most central banks work):


SELECT  *
FROM    (
        SELECT  *
        FROM    t_rate
        ORDER BY
                rte_date DESC
        )
WHERE   rownum <= 15

&#91;/sourcecode&#93;

<div class="terminal">
<table class="terminal">
<tr>
<th>RTE_CURRENCY</th><th>RTE_DATE</th><th>RTE_RATE</th></tr>
<tr><td class="decimal">1</td><td class="date">26.05.2009 00:00:00</td><td class="decimal">23,9699</td></tr>
<tr><td class="decimal">2</td><td class="date">26.05.2009 00:00:00</td><td class="decimal">29,0941</td></tr>
<tr><td class="decimal">1</td><td class="date">25.05.2009 00:00:00</td><td class="decimal">24,6972</td></tr>
<tr><td class="decimal">2</td><td class="date">25.05.2009 00:00:00</td><td class="decimal">29,6727</td></tr>
<tr><td class="decimal">1</td><td class="date">22.05.2009 00:00:00</td><td class="decimal">25,0568</td></tr>
<tr><td class="decimal">2</td><td class="date">22.05.2009 00:00:00</td><td class="decimal">30,5586</td></tr>
<tr><td class="decimal">1</td><td class="date">21.05.2009 00:00:00</td><td class="decimal">24,0338</td></tr>
<tr><td class="decimal">2</td><td class="date">21.05.2009 00:00:00</td><td class="decimal">29,2726</td></tr>
<tr><td class="decimal">1</td><td class="date">20.05.2009 00:00:00</td><td class="decimal">26,4241</td></tr>
<tr><td class="decimal">2</td><td class="date">20.05.2009 00:00:00</td><td class="decimal">31,1259</td></tr>
<tr><td class="decimal">1</td><td class="date">19.05.2009 00:00:00</td><td class="decimal">24,3832</td></tr>
<tr><td class="decimal">2</td><td class="date">19.05.2009 00:00:00</td><td class="decimal">31,2603</td></tr>
<tr><td class="decimal">1</td><td class="date">18.05.2009 00:00:00</td><td class="decimal">25,891</td></tr>
<tr><td class="decimal">2</td><td class="date">18.05.2009 00:00:00</td><td class="decimal">31,1934</td></tr>
<tr><td class="decimal">1</td><td class="date">15.05.2009 00:00:00</td><td class="decimal">25,5071</td></tr>
</table>
</div>

Now, we need to find the sum of these transactions, converted to the main currency.

To convert to the main currency, we should multiply to <em>effective rate</em> for a given date.

An <em>effective rate</em> is the last rate published up to the given date.

A typical query to do this looks like this:


SELECT  TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999')
FROM    t_transaction x
JOIN    t_rate r
ON      (rte_currency, rte_date) IN
        (
        SELECT  xac_currency, MAX(rte_date)
        FROM    t_rate
        WHERE   rte_currency = xac_currency
                AND rte_date <= xac_date
        )

&#91;/sourcecode&#93;

<div class="terminal">
<table class="terminal">
<tr>
<th>TO_CHAR(SUM(XAC_AMOUNT*RTE_RATE),'FM999G999G999G999G999G999D999999')</th></tr>
<tr><td class="varchar2">137 716 183 880,132757</td></tr>
<tr class="statusbar"><td colspan="100">1 row fetched in 0.0001s (9.7030s)</td></tr>
</table>
</div>

<pre>
SELECT STATEMENT 
 SORT AGGREGATE
  NESTED LOOPS 
   TABLE ACCESS FULL, 20090527_rates.T_TRANSACTION
   TABLE ACCESS BY INDEX ROWID, 20090527_rates.T_RATE
    INDEX UNIQUE SCAN, 20090527_rates.PK_RATE_CURRENCY_DATE
     SORT AGGREGATE
      FIRST ROW 
       INDEX RANGE SCAN (MIN/MAX), 20090527_rates.PK_RATE_CURRENCY_DATE
</pre>

This works, and works quite fast (less that <strong>10</strong> seconds), which is OK for <strong>1,000,000</strong> rows.

But can we improve it any more somehow?

If we look into the query plan we will see that the <code>NESTED LOOPS</code> are used to join <code>t_transaction</code> and <code>t_rate</code>.

This join method is quite inefficient for large datasets, but it's the only possible method, since we need to find the effective rate value for each transaction.

Neither <code>HASH JOIN</code> nor <code>MERGE JOIN</code> can be used here, since we have a <q>less or equal</q> <code>JOIN</code> condition.

To use <code>HASH JOIN</code> which would be most efficient here, we need to build a rowset that would be joinable with an equality condition with each row from <code>t_transaction</code>.

Can we build such a resultset?

We only have to use inequality condition here because there are gaps in the rate table for the days the rates were not published.

If we had a complete set of days, without any gaps, we could just truncate each transaction's date and use it in a join.

With <strong>Oracle</strong>, it's possible to build such a set.

Here are the steps to do that:

<ol>
	<li>Build a set of days, starting from the minimal one and ending with a maximal one of those found in <code>t_rate</code>. This can be done with a hierarchical query over <code>dual</code></li>
	<li>Build a set of currencies. There are only <strong>2</strong> of them, we can do it with a plain <code>UNION ALL</code></li>
	<li>Make a cartesian product of these sets so that we have every day and every currency for the range covered by <code>t_rate</code></li>
	<li>Join this with <code>t_rate</code>, finding effective rate for each day and each currency</li>
	<li>Group all transactions by truncated date and find their sums inside each date</li>
	<li>Join the resulting sets and find the total sum</li>
</ol>

Here is the resulting query:


WITH v_rate AS
        (
        SELECT  cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
        FROM    (
                SELECT  cur_id, dte_date,
                        (
                        SELECT  MAX(rte_date)
                        FROM    t_rate ri
                        WHERE   rte_currency = cur_id
                                AND rte_date <= dte_date
                        ) AS rte_effdate
                FROM    (
                        SELECT  (
                                SELECT  MAX(rte_date)
                                FROM    t_rate
                                ) - level + 1 AS dte_date
                        FROM    dual
                        CONNECT BY
                                level <=
                                (
                                SELECT  MAX(rte_date) - MIN(rte_date)
                                FROM    t_rate
                                )
                        ) v_date,
                        (
                        SELECT  1 AS cur_id
                        FROM    dual
                        UNION ALL
                        SELECT  2 AS cur_id
                        FROM    dual
                        ) v_currency
                ) v_eff
        LEFT JOIN
                t_rate
        ON      rte_currency = cur_id
                AND rte_date = rte_effdate
        )
SELECT  TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999')
FROM    (
        SELECT  xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
        FROM    t_transaction x
        GROUP BY
                xac_currency, TRUNC(xac_date)
        )
JOIN    v_rate
ON      eff_currency = xac_currency
        AND eff_date = xac_date
&#91;/sourcecode&#93;

<div class="terminal">
<table class="terminal">
<tr>
<th>TO_CHAR(SUM(XAC_AMOUNT*EFF_RATE),'FM999G999G999G999G999G999D999999')</th></tr>
<tr><td class="varchar2">137 716 183 880,132757</td></tr>
<tr class="statusbar"><td colspan="100">1 row fetched in 0.0001s (1.6563s)</td></tr>
</table>
</div>

<pre>
SELECT STATEMENT 
 SORT AGGREGATE
  HASH JOIN 
   NESTED LOOPS OUTER
    VIEW 
     MERGE JOIN CARTESIAN
      VIEW 
       CONNECT BY WITHOUT FILTERING
        FAST DUAL 
      BUFFER SORT
       VIEW 
        UNION-ALL 
         FAST DUAL 
         FAST DUAL 
    TABLE ACCESS BY INDEX ROWID, 20090527_rates.T_RATE
     INDEX UNIQUE SCAN, 20090527_rates.PK_RATE_CURRENCY_DATE
   VIEW 
    HASH GROUP BY
     TABLE ACCESS FULL, 20090527_rates.T_TRANSACTION
</pre>

Now it's only <strong>1.65</strong> seconds, or <strong>6</strong> times as fast.

This query is quite complex, so I think this will need some explanation.


                        SELECT  (
                                SELECT  MAX(rte_date)
                                FROM   t_rate
                                ) - level + 1 AS dte_date
                        FROM    dual
                        CONNECT BY
                                level <=
                                (
                                SELECT  MAX(rte_date) - MIN(rte_date)
                                FROM    t_rate
                                )
                        ) v_date
&#91;/sourcecode&#93;

This is a plain hierarchical query over <code>dual</code>, which is widely used in <strong>Oracle</strong> to generate a series of numbers.

This returns all possible dates from <code>MAX(rte_date)</code> to <code>MIN(rte_date)</code>.


                        SELECT  (
                                SELECT  MAX(rte_date)
                                FROM    t_rate
                                ) - level + 1 AS dte_date
                        FROM    dual
                        CONNECT BY
                                level <=
                                (
                                SELECT  MAX(rte_date) - MIN(rte_date)
                                FROM    t_rate
                                )
                        ) v_date,
                        (
                        SELECT  1 AS cur_id
                        FROM    dual
                        UNION ALL
                        SELECT  2 AS cur_id
                        FROM    dual
                        ) v_currency
&#91;/sourcecode&#93;

The previous query is <code>JOIN</code>'ed with a set of currencies (<strong>1</strong> and <strong>2</strong>) to produce a cartesian product. The resultset has every date for every currency.


                SELECT  cur_id, dte_date,
                        (
                        SELECT  MAX(rte_date)
                        FROM    t_rate ri
                        WHERE   rte_currency = cur_id
                                AND rte_date <= dte_date
                        ) AS rte_effdate
                FROM    (
                        SELECT  (
                                SELECT  MAX(rte_date)
                                FROM    t_rate
                                ) - level + 1 AS dte_date
                        FROM    dual
                        CONNECT BY
                                level <=
                                (
                                SELECT  MAX(rte_date) - MIN(rte_date)
                                FROM    t_rate
                                )
                        ) v_date,
                        (
                        SELECT  1 AS cur_id
                        FROM    dual
                        UNION ALL
                        SELECT  2 AS cur_id
                        FROM    dual
                        ) v_currency
                ) v_eff
&#91;/sourcecode&#93;

We select dates and currencies for the previous set and one more field: the last date before the given when the rate was published. Since <code>(rte_currency, rte_date)</code> pair is unique for each rate, we can use it in a <code>JOIN</code>.


WITH v_rate AS
        (
        SELECT  cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
        FROM    (
                SELECT  cur_id, dte_date,
                        (
                        SELECT  MAX(rte_date)
                        FROM    t_rate ri
                        WHERE   rte_currency = cur_id
                                AND rte_date <= dte_date
                        ) AS rte_effdate
                FROM    (
                        SELECT  (
                                SELECT  MAX(rte_date)
                                FROM    t_rate
                                ) - level + 1 AS dte_date
                        FROM    dual
                        CONNECT BY
                                level <=
                                (
                                SELECT  MAX(rte_date) - MIN(rte_date)
                                FROM    t_rate
                                )
                        ) v_date,
                        (
                        SELECT  1 AS cur_id
                        FROM    dual
                        UNION ALL
                        SELECT  2 AS cur_id
                        FROM    dual
                        ) v_currency
                ) v_eff
        LEFT JOIN
                t_rate
        ON      rte_currency = cur_id
                AND rte_date = rte_effdate
        )
&#91;/sourcecode&#93;

And here's the <code>JOIN</code>. The <strong>CTE</strong> above produces effective rate for every possible <code>(currency, date)</code> pair of those covered by <code>t_rate</code>.


        SELECT  xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
        FROM    t_transaction x
        GROUP BY
                xac_currency, TRUNC(xac_date)

This query selects SUM's of all transactions grouped by currency and date. Note that we have to use TRUNC(xac_date) here: unlike rates, the time part is set and matters for the transactions.

Finally, we just join these queries:

WITH v_rate AS
(
SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
FROM (
SELECT cur_id, dte_date,
(
SELECT MAX(rte_date)
FROM t_rate ri
WHERE rte_currency = cur_id
AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM ( SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date) ) JOIN v_rate ON eff_currency = xac_currency AND eff_date = xac_date [/sourcecode] Since the NESTED LOOPS are used to build only a relatively small resultset, and the larger one is built with a HASH JOIN, overall query performance is much higher.

Written by Quassnoi

May 27th, 2009 at 11:00 pm

Posted in Oracle

Leave a Reply