EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘Oracle’ Category

Oracle: generating a list of dates and counting ranges for each date

with 10 comments

From Stack Overflow:

I have a table with data such as below:

Group Start Date End Date
A 2001.01.01 2001.01.03
A 2001.01.01 2001.01.02
A 2001.01.03 2001.01.03
B 2001.01.01 2001.01.01

I am looking to produce a view that gives a count for each day

We have a list of ranges here, and for each date we should count the number of ranges that contain this date.

To make this query, we will employ one simple fact: the number of ranges containing a given date is the number of ranges started on or before this date minus the number of ranges that ended before this date.

This can easily be calculated using window functions.

Let's create a sample table:

Read the rest of this entry »

Written by Quassnoi

September 9th, 2009 at 11:00 pm

Posted in Oracle

Oracle: updating rows with partitioned incremental value

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with these example data:

id idx data
1 1 A
1 3 B
2 1 C
2 1 D
2 5 E

I want:

id idx data
1 1 A
1 2 B
2 1 C
2 3 D
2 4 E

Is there a way to renumber the index per id using a single SQL UPDATE?

Note that the order of items should be preserved (i. e. the item E should still be after the items C and D but the order of C and D doesn't really matter).

The goal is to be able to create a primary key over id and index.

If not for the id, the query would be very simple:

UPDATE  t_update
SET       index = rownum

However, the query above does not take id into account.

To update the table using partitioned row numbers, we need to employ the window function ROW_NUMBER().

However, it's usable neither in the UPDATE query itself, nor in an inline view, and attempt to use it produces an error:

ORA-30483: window  functions are not allowed here

To work around the, we should rewrite the UPDATE query as MERGE.

MERGE requires two rowsets (source and target) and they need to be joined on something.

We will use the table itself as a source They only field we can join on here is ROWID pseudocolumn.

Here's the query to do this:

MERGE
INTO    t_update u
USING   (
SELECT  rowid AS rid,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY idx) AS rn
FROM    t_update
)
ON      (u.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET     idx = rn

, and here's the result:

SELECT  *
FROM    t_update

ID IDX DATA
1 1 A
1 2 B
2 1 C
2 2 D
2 3 E

Works fine.

Written by Quassnoi

August 12th, 2009 at 11:00 pm

Posted in Oracle

Fallback language names: Oracle

Comments enabled. I *really* need your comment

This is a series of articles on efficient querying for a localized name, using a default (fallback) language if there is no localized name:

From Stack Overflow:

I have table item and another table language which contains names for the items in different languages:

item language data

How do I select a French name for an item if it exists, or a fallback English name if there is no French one?

This is quite a common database design (most often used for localization), so I'd like to cover this question for different databases.

This design is not limited to languages and localization: it may be used for any data with default value that can be overriden. Prices, discounts, options — many kinds of things.

Today we will see how do it in Oracle.

We basically have three options here:

  1. Use COALESCE on two SELECT list subqueries
  2. Use COALESCE on the results of two LEFT JOINS
  3. Use the combination of methods above: a LEFT JOIN for French names and a subquery for English ones

Let's create sample tables and see what is more efficient.
Read the rest of this entry »

Written by Quassnoi

August 6th, 2009 at 11:00 pm

Posted in Oracle

Oracle: OR on multiple EXISTS clauses

Comments enabled. I *really* need your comment

From Stack Overflow:

I have two queries, and I want to understand which is better in terms of performance and memory:

SELECT  DISTINCT
        a.no,
        a.id1,
        a.id2
FROM    tbl_b b,
        tbl_a a ,
        tbl_c c,
        tbl_d d
WHERE   (
        b.id1 = a.id1
        AND a.id1 = c.id1
        AND UPPER(c.flag) = 'Y'
        AND c.id1 = d.id1
        )
        OR
        (
        b.id2 = a.id2
        AND a.id2 = c.id2
        AND UPPER(c.flag) = 'Y'
        AND c.id2 = d.id2
        )
        AND d.id3 = 10

and

SELECT  DISTINCT
        a.no,
        a.id1,
        a.id2
FROM    tbl_a a
WHERE   EXISTS
        (
        SELECT  a.id1, a.id2
        FROM    tbl_c c
        WHERE   (a.id1 = c.id1 OR a.id2 = c.id2)
                AND UPPER(c.flag) = 'Y'
        )
        AND EXISTS
        (
        SELECT  a.id1, a.id2
        FROM    tbl_b b
        WHERE   b.id1 = a.id1 OR b.id2 = a.id2
        )
        AND EXISTS
        (
        SELECT  a.id1, a.id2
        FROM    tbl_d d
        WHERE   (a.id1 = d.id1 or a.id2 = d.id2)
                AND d.id3 = 10
        )

The tables tbl_b and tbl_d are very large tables containing 500,000 to millions of rows, while table tbl_a is relatively small.

My requirement is to pick up only those records from table tbl_a, whose id (either id1 or id2) is available in tbl_b, tbl_c, and tbl_d tables, satisfying certain other conditions as well.

Which is best performance-wise?

We can see that both these queries contain an OR condition, a nightmare for most optimizers.

The first query uses a join on all four tables, concatenating the results and making a distinct set out of them.

The second query check each row in tbl_a, making sure that the corresponding records exists in other tables in one or another way.

These queries are not identical: the first query will select the rows from tbl_a matching all tables on same id (either three matches on id1 or three matches on id2), while the second query returns rows matching on any id

This is, if we have a row matching tbl_b and tbl_c on id1 and tbl_d on id2, this row will be returned by the second query but not the first.

Both these queries will perform poorly on large tables. However, we can improve them.

Let's create the tables, fill them with sample data and make the improvements:
Read the rest of this entry »

Written by Quassnoi

July 15th, 2009 at 11:00 pm

Posted in Oracle

Overlapping numbers

Comments enabled. I *really* need your comment

Answering questions asked on the site.

Kat asks:

I want to determine if I have any overlapping ranges of numbers in a table for specific id's.

For example, the table has the following:

ID START_NUMBER END_NUMBER
1 3035112000 3035112010
2 3035112011 3035112020
3 3035112003 3035112005
4 3032925000 3032925010
1 3035113000 3035113050
2 3035113000 3035113050
3 3035113051 3035113052
4 3032925500 3032925550

I only want to pull those that have overlapping number ranges for id's 1, 2, and 3 (but not 4).

This is in Oracle.

We need to write a query to pull out all id's of rows that overlap at least one other row in the table.

Let's create a pair of sample tables:
Read the rest of this entry »

Written by Quassnoi

July 6th, 2009 at 11:00 pm

Posted in Oracle

Overlapping ranges: Oracle

Comments enabled. I *really* need your comment

Continuing the theme on overlapping ranges:

From Stack Overflow:

I have an event table that specifies a date range with start_date and end_date fields.

I have another date range, specified in code, that defines the current week as week_start and week_end.

I'd like to query all events for the week.

Today, let's see it in Oracle.

A quick reminder: we have two options to write a condition for such a query.

A more simple one:

start_date < @week_end AND end_date > @week_start

is, uh, more simple, but less efficient in SQL Server and MySQL.

A more complex one:


(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)

is more index friendly.

Let's create a sample table and see how Oracle copes with these conditions:
Read the rest of this entry »

Written by Quassnoi

July 2nd, 2009 at 11:00 pm

Posted in Oracle

NVL vs. COALESCE

with 2 comments

From Stack Overflow:

Are there non obvious differences between NVL and COALESCE in Oracle?

The obvious differences are that COALESCE will return the first non-NULL item in its parameter list whereas NVL only takes two parameters and returns the first if it is not NULL, otherwise it returns the second.

It seems that NVL may just be a base case version of COALESCE.

Am I missing something?

In case of two agruments of same type, these functions are basically the same: they both return first non-NULL values.

However, they are implemented differently:

  • NVL always evaluates both arguments
  • COALESCE doesn't evaluate anything after it finds the first non-NULL argument

This behavior can be utilized.

Here's a real world example from a restaurant management system I developed about 10 years ago.

A bill that is presented to a customer consists of the items he or she ordered.

Majority of the dishes ordered are plain menu items: something that has its price printed in the menu.

However, there are some special things which are not included to the menu: a customer may want to order something exclusive and pay for it.

The manager should estimate the cost of ingredients, time and effort to make this dish, then calculate the price and name it: everything in 30 seconds.

These things are of course come overpriced, but the customers that make such orders don't seem to care much about money.

In the bill that appears as special order. When a manager enters it into the computer, she should also enter the price she had calculated.

These prices then get back to the server and are stored in a table (not the one where regular menu prices are stored).

At the end of month, a report to the month's receipts should be generated.

Here are the tables containing the data relevant to the reports. They of course are simplified, but the principle remains the same.
Read the rest of this entry »

Written by Quassnoi

June 20th, 2009 at 11:00 pm

Posted in Oracle

Ordering NULLs

Comments enabled. I *really* need your comment

From Stack Overflow:

In our product we have a generic search engine, and trying to optimize the search performance.

A lot of the tables used in the queries allow NULL values.

Should we redesign our table to disallow NULL values for optimization or not?

NULL values surely affect some optimizer decisions in Oracle.

It's widely known that Oracle won't include a column into index if all key values are NULL.

This means that if we don't apply a filter which would imply non-NULL condition on the column (like, any comparison operator in the WHERE clause, or a comparison operation in the ON condition of a JOIN clause), the index is not guaranteed to return all values, and a FULL SCAN will be required on the table.

It is less evident that ORDER BY queries can be affected too.

Let's create a sample table and see how:
Read the rest of this entry »

Written by Quassnoi

June 19th, 2009 at 11:00 pm

Posted in Oracle

Flattening timespans: Oracle

Comments enabled. I *really* need your comment

In my previous article:

I described a function that merges intersecting timespanss into one contiguous range in SQL Server.

This query turned out to be a good illustration for window functions, so now we'll try to repeat this query in Oracle which is superb in terms of window functions support.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000
2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000
2009.06.03 10:13:12:000 11:14:56:000
2009.06.03 10:27:25:000 10:27:31:000
2009.06.03 10:27:39:000 13:53:38:000
2009.06.03 11:14:56:000 11:15:03:000
2009.06.03 11:15:30:000 14:02:14:000
2009.06.03 13:53:38:000 13:53:43:000
2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

As you may remember, we had to use the cursors in the SQL Server decision, since SQL Server lacks a way to calculate a running maximum.

But Oracle does have such a way, so we don't need cursors and functions here, and everything can be done in a single query.

The principle remains the same and is just changed a little to fit into one query:

  1. Assign a ROW_NUMBER() to each row (since the rows lack a PRIMARY KEY, we will need it later to use as one).
  2. Order the rows by start and calculate running maximum for the stop.
  3. Pick all rows that have a start greater than the running maximum of the previous row. Having such a timespan means that all timespans evaluated before had started earlier than the given one (since they are ordered by start) and also had ended earlier than the given one have started (since its start exceeds the running maximum). Therefore, we have a gap.
  4. Pick the ROW_NUMBER() of each gap row along with the ROW_NUMBER() of the record that comes right before the gap row next to the given one.
  5. Use these ROW_NUMBER()'s to find out the start and the stop of the corresponding records. These will be the timespan bounds.

Let's create a sample table and compose the query:
Read the rest of this entry »

Written by Quassnoi

June 12th, 2009 at 11:00 pm

Posted in Oracle

Longest common prefix: Oracle

Comments enabled. I *really* need your comment

Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:

This article will describe how to do it in Oracle.

A quick reminder of the problem (taken from Stack Overflow):

I have some data:

id ref
1 3536757616
1 3536757617
1 3536757618
2 3536757628
2 3536757629
2 3536757630

and want to get the result like this:

id result
1 3536757616/7/8
2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.

Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL clause to solve this task.
Read the rest of this entry »

Written by Quassnoi

June 6th, 2009 at 11:00 pm

Posted in Oracle