Converting currencies
Comments enabled. I *really* need your comment
When developing a financial application, one needs to deal with currencies and convert from one currency to another.
There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.
A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.
Let's create sample tables and see how it may be done:
CREATE TABLE t_transaction ( xac_id INT NOT NULL, xac_currency INT NOT NULL, xac_amount NUMBER(10, 2) NOT NULL, xac_date DATE NOT NULL ) / CREATE TABLE t_rate ( rte_currency INT NOT NULL, rte_date DATE NOT NULL, rte_rate NUMBER(10, 4) NOT NULL ) / ALTER TABLE t_transaction ADD CONSTRAINT pk_transaction_id PRIMARY KEY (xac_id) / CREATE INDEX ix_transaction_date ON t_transaction (xac_date) / ALTER TABLE t_rate ADD CONSTRAINT pk_rate_currency_date PRIMARY KEY (rte_currency, rte_date) / BEGIN DBMS_RANDOM.seed(20090518); END; / INSERT INTO t_transaction (xac_id, xac_currency, xac_amount, xac_date) SELECT level, DBMS_RANDOM.value() + 1, FLOOR(DBMS_RANDOM.value() * 1000000 + 1) / 100, TO_DATE('27.05.2009', 'dd.mm.yyyy') - (level / 1440) FROM dual CONNECT BY level <= 1000000 / INSERT ALL INTO t_rate (rte_currency, rte_date, rte_rate) VALUES ( 1, TO_DATE('27.05.2009', 'dd.mm.yyyy') - lvl, 25 + CAST((DBMS_RANDOM.value() - 0.5) * 30000 AS INT) / 10000 ) INTO t_rate (rte_currency, rte_date, rte_rate) VALUES ( 2, TO_DATE('27.05.2009', 'dd.mm.yyyy') - lvl, 30 + CAST((DBMS_RANDOM.value() - 0.5) * 30000 AS INT) / 10000 ) SELECT level AS lvl FROM dual WHERE MOD(level, 7) NOT IN (3, 4) CONNECT BY level <= 1000 / COMMIT / BEGIN DBMS_STATS.gather_schema_stats('"20090527_rates"'); END; / [/sourcecode] There are <strong>1,000,000</strong> transactions in <strong>2</strong> currencies, one transaction per minute, with amounts randomly distributed from <strong>0.01</strong> to <strong>9999.99</strong>: SELECT * FROM t_transaction WHERE rownum <= 10 [/sourcecode] <div class="terminal"> <table class="terminal"> <tr> <th>XAC_ID</th><th>XAC_CURRENCY</th><th>XAC_AMOUNT</th><th>XAC_DATE</th></tr> <tr><td class="decimal">1</td><td class="decimal">2</td><td class="decimal">5527,24</td><td class="date">26.05.2009 23:59:00</td></tr> <tr><td class="decimal">2</td><td class="decimal">1</td><td class="decimal">7066,16</td><td class="date">26.05.2009 23:58:00</td></tr> <tr><td class="decimal">3</td><td class="decimal">2</td><td class="decimal">6849,27</td><td class="date">26.05.2009 23:57:00</td></tr> <tr><td class="decimal">4</td><td class="decimal">1</td><td class="decimal">5654,12</td><td class="date">26.05.2009 23:56:00</td></tr> <tr><td class="decimal">5</td><td class="decimal">2</td><td class="decimal">6748,84</td><td class="date">26.05.2009 23:55:00</td></tr> <tr><td class="decimal">6</td><td class="decimal">2</td><td class="decimal">5184,32</td><td class="date">26.05.2009 23:54:00</td></tr> <tr><td class="decimal">7</td><td class="decimal">1</td><td class="decimal">6832,27</td><td class="date">26.05.2009 23:53:00</td></tr> <tr><td class="decimal">8</td><td class="decimal">2</td><td class="decimal">1572,45</td><td class="date">26.05.2009 23:52:00</td></tr> <tr><td class="decimal">9</td><td class="decimal">2</td><td class="decimal">2287,1</td><td class="date">26.05.2009 23:51:00</td></tr> <tr><td class="decimal">10</td><td class="decimal">1</td><td class="decimal">2916,78</td><td class="date">26.05.2009 23:50:00</td></tr> </table> </div> For each currency, we have one rate per day, <em>except for <strong>Saturdays</strong> and <strong>Sundays</strong></em> (that's how most central banks work): SELECT * FROM ( SELECT * FROM t_rate ORDER BY rte_date DESC ) WHERE rownum <= 15 [/sourcecode] <div class="terminal"> <table class="terminal"> <tr> <th>RTE_CURRENCY</th><th>RTE_DATE</th><th>RTE_RATE</th></tr> <tr><td class="decimal">1</td><td class="date">26.05.2009 00:00:00</td><td class="decimal">23,9699</td></tr> <tr><td class="decimal">2</td><td class="date">26.05.2009 00:00:00</td><td class="decimal">29,0941</td></tr> <tr><td class="decimal">1</td><td class="date">25.05.2009 00:00:00</td><td class="decimal">24,6972</td></tr> <tr><td class="decimal">2</td><td class="date">25.05.2009 00:00:00</td><td class="decimal">29,6727</td></tr> <tr><td class="decimal">1</td><td class="date">22.05.2009 00:00:00</td><td class="decimal">25,0568</td></tr> <tr><td class="decimal">2</td><td class="date">22.05.2009 00:00:00</td><td class="decimal">30,5586</td></tr> <tr><td class="decimal">1</td><td class="date">21.05.2009 00:00:00</td><td class="decimal">24,0338</td></tr> <tr><td class="decimal">2</td><td class="date">21.05.2009 00:00:00</td><td class="decimal">29,2726</td></tr> <tr><td class="decimal">1</td><td class="date">20.05.2009 00:00:00</td><td class="decimal">26,4241</td></tr> <tr><td class="decimal">2</td><td class="date">20.05.2009 00:00:00</td><td class="decimal">31,1259</td></tr> <tr><td class="decimal">1</td><td class="date">19.05.2009 00:00:00</td><td class="decimal">24,3832</td></tr> <tr><td class="decimal">2</td><td class="date">19.05.2009 00:00:00</td><td class="decimal">31,2603</td></tr> <tr><td class="decimal">1</td><td class="date">18.05.2009 00:00:00</td><td class="decimal">25,891</td></tr> <tr><td class="decimal">2</td><td class="date">18.05.2009 00:00:00</td><td class="decimal">31,1934</td></tr> <tr><td class="decimal">1</td><td class="date">15.05.2009 00:00:00</td><td class="decimal">25,5071</td></tr> </table> </div> Now, we need to find the sum of these transactions, converted to the main currency. To convert to the main currency, we should multiply to <em>effective rate</em> for a given date. An <em>effective rate</em> is the last rate published up to the given date. A typical query to do this looks like this: SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999') FROM t_transaction x JOIN t_rate r ON (rte_currency, rte_date) IN ( SELECT xac_currency, MAX(rte_date) FROM t_rate WHERE rte_currency = xac_currency AND rte_date <= xac_date ) [/sourcecode] <div class="terminal"> <table class="terminal"> <tr> <th>TO_CHAR(SUM(XAC_AMOUNT*RTE_RATE),'FM999G999G999G999G999G999D999999')</th></tr> <tr><td class="varchar2">137 716 183 880,132757</td></tr> <tr class="statusbar"><td colspan="100">1 row fetched in 0.0001s (9.7030s)</td></tr> </table> </div> <pre> SELECT STATEMENT SORT AGGREGATE NESTED LOOPS TABLE ACCESS FULL, 20090527_rates.T_TRANSACTION TABLE ACCESS BY INDEX ROWID, 20090527_rates.T_RATE INDEX UNIQUE SCAN, 20090527_rates.PK_RATE_CURRENCY_DATE SORT AGGREGATE FIRST ROW INDEX RANGE SCAN (MIN/MAX), 20090527_rates.PK_RATE_CURRENCY_DATE </pre> This works, and works quite fast (less that <strong>10</strong> seconds), which is OK for <strong>1,000,000</strong> rows. But can we improve it any more somehow? If we look into the query plan we will see that the <code>NESTED LOOPS</code> are used to join <code>t_transaction</code> and <code>t_rate</code>. This join method is quite inefficient for large datasets, but it's the only possible method, since we need to find the effective rate value for each transaction. Neither <code>HASH JOIN</code> nor <code>MERGE JOIN</code> can be used here, since we have a <q>less or equal</q> <code>JOIN</code> condition. To use <code>HASH JOIN</code> which would be most efficient here, we need to build a rowset that would be joinable with an equality condition with each row from <code>t_transaction</code>. Can we build such a resultset? We only have to use inequality condition here because there are gaps in the rate table for the days the rates were not published. If we had a complete set of days, without any gaps, we could just truncate each transaction's date and use it in a join. With <strong>Oracle</strong>, it's possible to build such a set. Here are the steps to do that: <ol> <li>Build a set of days, starting from the minimal one and ending with a maximal one of those found in <code>t_rate</code>. This can be done with a hierarchical query over <code>dual</code></li> <li>Build a set of currencies. There are only <strong>2</strong> of them, we can do it with a plain <code>UNION ALL</code></li> <li>Make a cartesian product of these sets so that we have every day and every currency for the range covered by <code>t_rate</code></li> <li>Join this with <code>t_rate</code>, finding effective rate for each day and each currency</li> <li>Group all transactions by truncated date and find their sums inside each date</li> <li>Join the resulting sets and find the total sum</li> </ol> Here is the resulting query: WITH v_rate AS ( SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate FROM ( SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM ( SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date) ) JOIN v_rate ON eff_currency = xac_currency AND eff_date = xac_date [/sourcecode] <div class="terminal"> <table class="terminal"> <tr> <th>TO_CHAR(SUM(XAC_AMOUNT*EFF_RATE),'FM999G999G999G999G999G999D999999')</th></tr> <tr><td class="varchar2">137 716 183 880,132757</td></tr> <tr class="statusbar"><td colspan="100">1 row fetched in 0.0001s (1.6563s)</td></tr> </table> </div> <pre> SELECT STATEMENT SORT AGGREGATE HASH JOIN NESTED LOOPS OUTER VIEW MERGE JOIN CARTESIAN VIEW CONNECT BY WITHOUT FILTERING FAST DUAL BUFFER SORT VIEW UNION-ALL FAST DUAL FAST DUAL TABLE ACCESS BY INDEX ROWID, 20090527_rates.T_RATE INDEX UNIQUE SCAN, 20090527_rates.PK_RATE_CURRENCY_DATE VIEW HASH GROUP BY TABLE ACCESS FULL, 20090527_rates.T_TRANSACTION </pre> Now it's only <strong>1.65</strong> seconds, or <strong>6</strong> times as fast. This query is quite complex, so I think this will need some explanation. SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date [/sourcecode] This is a plain hierarchical query over <code>dual</code>, which is widely used in <strong>Oracle</strong> to generate a series of numbers. This returns all possible dates from <code>MAX(rte_date)</code> to <code>MIN(rte_date)</code>. SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency [/sourcecode] The previous query is <code>JOIN</code>'ed with a set of currencies (<strong>1</strong> and <strong>2</strong>) to produce a cartesian product. The resultset has every date for every currency. SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff [/sourcecode] We select dates and currencies for the previous set and one more field: the last date before the given when the rate was published. Since <code>(rte_currency, rte_date)</code> pair is unique for each rate, we can use it in a <code>JOIN</code>. WITH v_rate AS ( SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate FROM ( SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) [/sourcecode] And here's the <code>JOIN</code>. The <strong>CTE</strong> above produces effective rate for every possible <code>(currency, date)</code> pair of those covered by <code>t_rate</code>. SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date)
This query selects SUM
's of all transactions grouped by currency and date. Note that we have to use TRUNC(xac_date)
here: unlike rates, the time part is set and matters for the transactions.
Finally, we just join these queries:
WITH v_rate AS
(
SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
FROM (
SELECT cur_id, dte_date,
(
SELECT MAX(rte_date)
FROM t_rate ri
WHERE rte_currency = cur_id
AND rte_date <= dte_date
) AS rte_effdate
FROM (
SELECT (
SELECT MAX(rte_date)
FROM t_rate
) - level + 1 AS dte_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(rte_date) - MIN(rte_date)
FROM t_rate
)
) v_date,
(
SELECT 1 AS cur_id
FROM dual
UNION ALL
SELECT 2 AS cur_id
FROM dual
) v_currency
) v_eff
LEFT JOIN
t_rate
ON rte_currency = cur_id
AND rte_date = rte_effdate
)
SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999')
FROM (
SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
FROM t_transaction x
GROUP BY
xac_currency, TRUNC(xac_date)
)
JOIN v_rate
ON eff_currency = xac_currency
AND eff_date = xac_date
[/sourcecode]
Since the NESTED LOOPS
are used to build only a relatively small resultset, and the larger one is built with a HASH JOIN
, overall query performance is much higher.