EXPLAIN EXTENDED

How to create fast database queries

Checking event dates

Comments enabled. I *really* need your comment

From Stack Overflow:

Suppose the following table structure in Oracle:

CREATE TABLE event (
id INTEGER,
start_date DATE,
end_date DATE
)

Is there a way to query all of the events that fall on a particular day of the week?

For example, I would like to find a query that would find every event that falls on a Monday.

Figuring out if the start_date or end_date falls on a Monday is easy, but I'm not sure how to find it out for the dates between.

This is one of the range predicates which are very unfriendly to plain B-Tree indexes.

But even if there would be a range friendly index (like R-Tree), that would hardly be an improvement. Monday's make up 14.3% of all days, that means that an index if there were any, would have very low selectivity even for one-day intervals.

And if the majority of intervals last for more than one day, the selectivity of the condition yet decreases: 86% of 6-day intervals have a Monday inside.

Given the drawbacks of index scanning and joining on ROWID, we can say that a FULL TABLE SCAN will be a nice access path for this query, and we just need to represent it as an SQL condition (without bothering for its sargability)

We could check that a Monday is between end_date's day-of-week number and the range length's offset from this number:

SELECT  *
FROM    "20090507_dates".event
WHERE   6 BETWEEN MOD(start_date - TO_DATE(1, 'J'), 7) AND MOD(start_date - TO_DATE(1, 'J'), 7) + end_date - start_date

This query converts each ranges into a pair of zero-based, Tuesday-based day of week offsets, and returns all records which have day 6 (a Monday) inside the range.

Note that we don't use Oracle's TO_DATE('D') function here: starting day of week depends on NLS_TERRITORY which only leads to confusion.

Now, this query works but looks quite ugly. And if we will check for more complex conditions, it will become even uglier.

What if we need to find all ranges that contain a Friday, 13th? Or a second week's Thursday? The conditions will become unreadable and unmaintainable.

Can we do it in some more elegant way?

What if we just iterate over the days of the range and check each day for the condition? This should be much more simple than inventing the boundaries.

Let's create a sample table and try it:

CREATE TABLE event (
id NUMBER(10) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL
)
/

BEGIN
DBMS_RANDOM.seed(20090507);
END;
/

INSERT
INTO    event (id, start_date, end_date)
SELECT  id, start_date, start_date + len
FROM    (
SELECT  /*+ NO_MERGE */
level AS id,
TO_DATE('07.05.2009', 'dd.mm.yyyy') - FLOOR(DBMS_RANDOM.value(6, 3000)) AS start_date,
FLOOR(DBMS_RANDOM.value(1, 6)) AS len
FROM    dual
CONNECT BY
level <= 10000
        )
/
COMMIT
/

BEGIN
        DBMS_STATS.gather_schema_stats('"20090507_dates"');
END;
/

This table has 10,000 event ranges with random start_date's and random lengths from 1 to 5 days.

Let's try our initial query (for the sake of brevity, we will just COUNT(*) the events)

SELECT  COUNT(*)
FROM    event
WHERE   6 BETWEEN MOD(start_date - TO_DATE(1, 'J'), 7) AND MOD(start_date - TO_DATE(1, 'J'), 7) + end_date - start_date
COUNT(*)
5724
1 row fetched in 0.0001s (0.1018s)
SELECT STATEMENT 
 SORT AGGREGATE
  TABLE ACCESS FULL, 20090507_dates.EVENT

This query returned 5724 events.

Now, let's build a subquery that iterates over each day in the range and returns as soon as it finds a matching Monday:

SELECT  COUNT(*)
FROM    event
WHERE   EXISTS
(
SELECT  1
FROM    dual
WHERE   MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 6
CONNECT BY
level <= end_date - start_date + 1
        )

COUNT(*)
5724
1 row fetched in 0.0001s (0.1129s)
SELECT STATEMENT 
 SORT AGGREGATE
  FILTER 
   TABLE ACCESS FULL, 20090507_dates.EVENT
   FILTER 
    CONNECT BY WITHOUT FILTERING
     FAST DUAL 

Exactly same results.

The hierarchical subquery generates a rowset having as many rows as there are days in the range.

For each row in the rowset, we can get the corresponding day in the range: it will be defined by expression start_date + level - 1. We can check this date for any condition we like.

MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 6, for instance, checks that a date is a Monday.

Oracle optimizes EXISTS so that it returns as soon as the first value is found.

But since dual is accessed with FAST DUAL access method which implies no table I/O, the loops have very low performance impact anyway.

Now, let's try to complicate the query. Let's find all ranges that have a March, 13th which has fallen on a Friday:

SELECT  *
FROM    event
WHERE   EXISTS
(
SELECT  1
FROM    dual
WHERE   MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 3
AND TO_CHAR((start_date + level - 1), 'DD') = 13
AND TO_CHAR((start_date + level - 1), 'MM') = 9
CONNECT BY
level <= end_date - start_date + 1
        )

ID START_DATE END_DATE
769 11.09.2002 00:00:00 15.09.2002 00:00:00
810 13.09.2002 00:00:00 16.09.2002 00:00:00
3967 12.09.2002 00:00:00 14.09.2002 00:00:00
4383 09.09.2002 00:00:00 13.09.2002 00:00:00
4867 12.09.2002 00:00:00 17.09.2002 00:00:00
4903 13.09.2002 00:00:00 18.09.2002 00:00:00
5262 12.09.2002 00:00:00 14.09.2002 00:00:00
5300 12.09.2002 00:00:00 13.09.2002 00:00:00
5552 12.09.2002 00:00:00 13.09.2002 00:00:00
5974 11.09.2002 00:00:00 13.09.2002 00:00:00
5981 13.09.2002 00:00:00 15.09.2002 00:00:00
6809 10.09.2002 00:00:00 13.09.2002 00:00:00
6891 10.09.2002 00:00:00 14.09.2002 00:00:00
7979 12.09.2002 00:00:00 16.09.2002 00:00:00
8054 11.09.2002 00:00:00 14.09.2002 00:00:00
9298 11.09.2002 00:00:00 16.09.2002 00:00:00
9529 13.09.2002 00:00:00 15.09.2002 00:00:00
9545 12.09.2002 00:00:00 13.09.2002 00:00:00
9858 10.09.2002 00:00:00 15.09.2002 00:00:00
19 rows fetched in 0.0007s (0.1988s)
SELECT STATEMENT 
 FILTER 
  TABLE ACCESS FULL, 20090507_dates.EVENT
  FILTER 
   CONNECT BY WITHOUT FILTERING
    FAST DUAL 

We added two more simple conditions into the WHERE clause of the subquery, and the query works fine.

That would hardly be feasible if we were trying to construct the expression using only the range limits: this condition is way too complex.

Written by Quassnoi

May 7th, 2009 at 11:00 pm

Posted in Oracle

Leave a Reply