Archive for the ‘Oracle’ Category
Oracle: generating a list of dates and counting ranges for each date
From Stack Overflow:
I have a table with data such as below:
Group Start Date End Date A 2001.01.01 2001.01.03 A 2001.01.01 2001.01.02 A 2001.01.03 2001.01.03 B 2001.01.01 2001.01.01 I am looking to produce a view that gives a count for each day
We have a list of ranges here, and for each date we should count the number of ranges that contain this date.
To make this query, we will employ one simple fact: the number of ranges containing a given date is the number of ranges started on or before this date minus the number of ranges that ended before this date.
This can easily be calculated using window functions.
Let's create a sample table:
Oracle: updating rows with partitioned incremental value
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with these example data:
id idx data 1 1 A 1 3 B 2 1 C 2 1 D 2 5 E I want:
id idx data 1 1 A 1 2 B 2 1 C 2 3 D 2 4 E Is there a way to renumber the
index
perid
using a single SQLUPDATE
?Note that the order of items should be preserved (i. e. the item E should still be after the items C and D but the order of C and D doesn't really matter).
The goal is to be able to create a primary key over
id
andindex
.
If not for the id
, the query would be very simple:
UPDATE t_update SET index = rownum
However, the query above does not take id
into account.
To update the table using partitioned row numbers, we need to employ the window function ROW_NUMBER()
.
However, it's usable neither in the UPDATE
query itself, nor in an inline view, and attempt to use it produces an error:
ORA-30483: window functions are not allowed here
To work around the, we should rewrite the UPDATE
query as MERGE
.
MERGE
requires two rowsets (source and target) and they need to be joined on something.
We will use the table itself as a source They only field we can join on here is ROWID
pseudocolumn.
Here's the query to do this:
MERGE INTO t_update u USING ( SELECT rowid AS rid, ROW_NUMBER() OVER (PARTITION BY id ORDER BY idx) AS rn FROM t_update ) ON (u.rowid = rid) WHEN MATCHED THEN UPDATE SET idx = rn
, and here's the result:
SELECT * FROM t_update
ID | IDX | DATA |
---|---|---|
1 | 1 | A |
1 | 2 | B |
2 | 1 | C |
2 | 2 | D |
2 | 3 | E |
Works fine.
Fallback language names: Oracle
Comments enabled. I *really* need your comment
This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:
- Fallback language names: Oracle
- Fallback language names: SQL Server
- Fallback language names: PostgreSQL
- Fallback language names: MySQL
From Stack Overflow:
I have table
item
and another tablelanguage
which contains names for theitems
in different languages:
item language data How do I select a French name for an item if it exists, or a fallback English name if there is no French one?
This is quite a common database design (most often used for localization), so I'd like to cover this question for different databases.
This design is not limited to languages and localization: it may be used for any data with default value that can be overriden. Prices, discounts, options — many kinds of things.
Today we will see how do it in Oracle.
We basically have three options here:
- Use
COALESCE
on twoSELECT
list subqueries - Use
COALESCE
on the results of twoLEFT JOINS
- Use the combination of methods above: a
LEFT JOIN
for French names and a subquery for English ones
Let's create sample tables and see what is more efficient.
Read the rest of this entry »
Oracle: OR on multiple EXISTS clauses
Comments enabled. I *really* need your comment
From Stack Overflow:
I have two queries, and I want to understand which is better in terms of performance and memory:
SELECT DISTINCT a.no, a.id1, a.id2 FROM tbl_b b, tbl_a a , tbl_c c, tbl_d d WHERE ( b.id1 = a.id1 AND a.id1 = c.id1 AND UPPER(c.flag) = 'Y' AND c.id1 = d.id1 ) OR ( b.id2 = a.id2 AND a.id2 = c.id2 AND UPPER(c.flag) = 'Y' AND c.id2 = d.id2 ) AND d.id3 = 10and
SELECT DISTINCT a.no, a.id1, a.id2 FROM tbl_a a WHERE EXISTS ( SELECT a.id1, a.id2 FROM tbl_c c WHERE (a.id1 = c.id1 OR a.id2 = c.id2) AND UPPER(c.flag) = 'Y' ) AND EXISTS ( SELECT a.id1, a.id2 FROM tbl_b b WHERE b.id1 = a.id1 OR b.id2 = a.id2 ) AND EXISTS ( SELECT a.id1, a.id2 FROM tbl_d d WHERE (a.id1 = d.id1 or a.id2 = d.id2) AND d.id3 = 10 )The tables
tbl_b
andtbl_d
are very large tables containing 500,000 to millions of rows, while tabletbl_a
is relatively small.My requirement is to pick up only those records from table
tbl_a
, whoseid
(eitherid1
orid2
) is available intbl_b
,tbl_c
, andtbl_d
tables, satisfying certain other conditions as well.Which is best performance-wise?
We can see that both these queries contain an OR
condition, a nightmare for most optimizers.
The first query uses a join on all four tables, concatenating the results and making a distinct set out of them.
The second query check each row in tbl_a
, making sure that the corresponding records exists in other tables in one or another way.
These queries are not identical: the first query will select the rows from tbl_a
matching all tables on same id (either three matches on id1
or three matches on id2
), while the second query returns rows matching on any id
This is, if we have a row matching tbl_b
and tbl_c
on id1
and tbl_d
on id2
, this row will be returned by the second query but not the first.
Both these queries will perform poorly on large tables. However, we can improve them.
Let's create the tables, fill them with sample data and make the improvements:
Read the rest of this entry »
Overlapping numbers
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Kat asks:
I want to determine if I have any overlapping ranges of numbers in a table for specific
id
's.For example, the table has the following:
ID START_NUMBER END_NUMBER 1 3035112000 3035112010 2 3035112011 3035112020 3 3035112003 3035112005 4 3032925000 3032925010 1 3035113000 3035113050 2 3035113000 3035113050 3 3035113051 3035113052 4 3032925500 3032925550 I only want to pull those that have overlapping number ranges for
id
's 1, 2, and 3 (but not 4).This is in Oracle.
We need to write a query to pull out all id
's of rows that overlap at least one other row in the table.
Let's create a pair of sample tables:
Read the rest of this entry »
Overlapping ranges: Oracle
Comments enabled. I *really* need your comment
Continuing the theme on overlapping ranges:
From Stack Overflow:
I have an event table that specifies a date range with
start_date
andend_date
fields.I have another date range, specified in code, that defines the current week as
week_start
andweek_end
.I'd like to query all events for the week.
Today, let's see it in Oracle.
A quick reminder: we have two options to write a condition for such a query.
A more simple one:
start_date < @week_end AND end_date > @week_start
is, uh, more simple, but less efficient in SQL Server and MySQL.
A more complex one:
(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)
is more index friendly.
Let's create a sample table and see how Oracle copes with these conditions:
Read the rest of this entry »
NVL vs. COALESCE
From Stack Overflow:
Are there non obvious differences between
NVL
andCOALESCE
in Oracle?The obvious differences are that
COALESCE
will return the first non-NULL
item in its parameter list whereasNVL
only takes two parameters and returns the first if it is notNULL
, otherwise it returns the second.It seems that
NVL
may just be abase caseversion ofCOALESCE
.Am I missing something?
In case of two agruments of same type, these functions are basically the same: they both return first non-NULL
values.
However, they are implemented differently:
NVL
always evaluates both argumentsCOALESCE
doesn't evaluate anything after it finds the first non-NULL
argument
This behavior can be utilized.
Here's a real world example from a restaurant management system I developed about 10 years ago.
A bill that is presented to a customer consists of the items he or she ordered.
Majority of the dishes ordered are plain menu items: something that has its price printed in the menu.
However, there are some special things which are not included to the menu: a customer may want to order something exclusive and pay for it.
The manager should estimate the cost of ingredients, time and effort to make this dish, then calculate the price and name it: everything in 30 seconds.
These things are of course come overpriced, but the customers that make such orders don't seem to care much about money.
In the bill that appears as special order
. When a manager enters it into the computer, she should also enter the price she had calculated.
These prices then get back to the server and are stored in a table (not the one where regular menu prices are stored).
At the end of month, a report to the month's receipts should be generated.
Here are the tables containing the data relevant to the reports. They of course are simplified, but the principle remains the same.
Read the rest of this entry »
Ordering NULLs
Comments enabled. I *really* need your comment
From Stack Overflow:
In our product we have a generic search engine, and trying to optimize the search performance.
A lot of the tables used in the queries allow
NULL
values.Should we redesign our table to disallow
NULL
values for optimization or not?
NULL
values surely affect some optimizer decisions in Oracle.
It's widely known that Oracle won't include a column into index if all key values are NULL
.
This means that if we don't apply a filter which would imply non-NULL
condition on the column (like, any comparison operator in the WHERE
clause, or a comparison operation in the ON
condition of a JOIN
clause), the index is not guaranteed to return all values, and a FULL SCAN
will be required on the table.
It is less evident that ORDER BY
queries can be affected too.
Let's create a sample table and see how:
Read the rest of this entry »
Flattening timespans: Oracle
Comments enabled. I *really* need your comment
In my previous article:
I described a function that merges intersecting timespanss into one contiguous range in SQL Server.
This query turned out to be a good illustration for window functions, so now we'll try to repeat this query in Oracle which is superb in terms of window functions support.
A quick reminder of the problem, taken from Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
As you may remember, we had to use the cursors in the SQL Server decision, since SQL Server lacks a way to calculate a running maximum.
But Oracle does have such a way, so we don't need cursors and functions here, and everything can be done in a single query.
The principle remains the same and is just changed a little to fit into one query:
- Assign a
ROW_NUMBER()
to each row (since the rows lack aPRIMARY KEY
, we will need it later to use as one). - Order the rows by
start
and calculate running maximum for the stop. - Pick all rows that have a
start
greater than the running maximum of the previous row. Having such a timespan means that all timespans evaluated before had started earlier than the given one (since they are ordered bystart
) and also had ended earlier than the given one have started (since its start exceeds the running maximum). Therefore, we have a gap. - Pick the
ROW_NUMBER()
of each gap row along with theROW_NUMBER()
of the record that comes right before the gap row next to the given one. - Use these
ROW_NUMBER()
's to find out thestart
and thestop
of the corresponding records. These will be the timespan bounds.
Let's create a sample table and compose the query:
Read the rest of this entry »
Longest common prefix: Oracle
Comments enabled. I *really* need your comment
Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
This article will describe how to do it in Oracle.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.
Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL
clause to solve this task.
Read the rest of this entry »