EXPLAIN EXTENDED

How to create fast database queries

Archive for June 12th, 2009

Flattening timespans: Oracle

Comments enabled. I *really* need your comment

In my previous article:

I described a function that merges intersecting timespanss into one contiguous range in SQL Server.

This query turned out to be a good illustration for window functions, so now we'll try to repeat this query in Oracle which is superb in terms of window functions support.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000
2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000
2009.06.03 10:13:12:000 11:14:56:000
2009.06.03 10:27:25:000 10:27:31:000
2009.06.03 10:27:39:000 13:53:38:000
2009.06.03 11:14:56:000 11:15:03:000
2009.06.03 11:15:30:000 14:02:14:000
2009.06.03 13:53:38:000 13:53:43:000
2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

As you may remember, we had to use the cursors in the SQL Server decision, since SQL Server lacks a way to calculate a running maximum.

But Oracle does have such a way, so we don't need cursors and functions here, and everything can be done in a single query.

The principle remains the same and is just changed a little to fit into one query:

  1. Assign a ROW_NUMBER() to each row (since the rows lack a PRIMARY KEY, we will need it later to use as one).
  2. Order the rows by start and calculate running maximum for the stop.
  3. Pick all rows that have a start greater than the running maximum of the previous row. Having such a timespan means that all timespans evaluated before had started earlier than the given one (since they are ordered by start) and also had ended earlier than the given one have started (since its start exceeds the running maximum). Therefore, we have a gap.
  4. Pick the ROW_NUMBER() of each gap row along with the ROW_NUMBER() of the record that comes right before the gap row next to the given one.
  5. Use these ROW_NUMBER()'s to find out the start and the stop of the corresponding records. These will be the timespan bounds.

Let's create a sample table and compose the query:
Read the rest of this entry »

Written by Quassnoi

June 12th, 2009 at 11:00 pm

Posted in Oracle