Checking event dates
Comments enabled. I *really* need your comment
From Stack Overflow:
Suppose the following table structure in Oracle:
CREATE TABLE event ( id INTEGER, start_date DATE, end_date DATE )Is there a way to query all of the events that fall on a particular day of the week?
For example, I would like to find a query that would find every event that falls on a Monday.
Figuring out if the
start_date
orend_date
falls on a Monday is easy, but I'm not sure how to find it out for the dates between.
This is one of the range predicates which are very unfriendly to plain B-Tree
indexes.
But even if there would be a range friendly index (like R-Tree
), that would hardly be an improvement. Monday's make up 14.3% of all days, that means that an index if there were any, would have very low selectivity even for one-day intervals.
And if the majority of intervals last for more than one day, the selectivity of the condition yet decreases: 86% of 6-day intervals have a Monday inside.
Given the drawbacks of index scanning and joining on ROWID, we can say that a FULL TABLE SCAN
will be a nice access path for this query, and we just need to represent it as an SQL condition (without bothering for its sargability)
We could check that a Monday is between end_date
's day-of-week number and the range length's offset from this number:
SELECT * FROM "20090507_dates".event WHERE 6 BETWEEN MOD(start_date - TO_DATE(1, 'J'), 7) AND MOD(start_date - TO_DATE(1, 'J'), 7) + end_date - start_date
This query converts each ranges into a pair of zero-based, Tuesday-based day of week offsets, and returns all records which have day 6 (a Monday) inside the range.
Note that we don't use Oracle's TO_DATE('D')
function here: starting day of week depends on NLS_TERRITORY
which only leads to confusion.
Now, this query works but looks quite ugly. And if we will check for more complex conditions, it will become even uglier.
What if we need to find all ranges that contain a Friday, 13th? Or a second week's Thursday? The conditions will become unreadable and unmaintainable.
Can we do it in some more elegant way?
What if we just iterate over the days of the range and check each day for the condition? This should be much more simple than inventing the boundaries.
Let's create a sample table and try it:
CREATE TABLE event ( id NUMBER(10) NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL ) / BEGIN DBMS_RANDOM.seed(20090507); END; / INSERT INTO event (id, start_date, end_date) SELECT id, start_date, start_date + len FROM ( SELECT /*+ NO_MERGE */ level AS id, TO_DATE('07.05.2009', 'dd.mm.yyyy') - FLOOR(DBMS_RANDOM.value(6, 3000)) AS start_date, FLOOR(DBMS_RANDOM.value(1, 6)) AS len FROM dual CONNECT BY level <= 10000 ) / COMMIT / BEGIN DBMS_STATS.gather_schema_stats('"20090507_dates"'); END; /
This table has 10,000 event ranges with random start_date
's and random lengths from 1 to 5 days.
Let's try our initial query (for the sake of brevity, we will just COUNT(*)
the events)
SELECT COUNT(*) FROM event WHERE 6 BETWEEN MOD(start_date - TO_DATE(1, 'J'), 7) AND MOD(start_date - TO_DATE(1, 'J'), 7) + end_date - start_date
COUNT(*) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5724 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.1018s) |
SELECT STATEMENT SORT AGGREGATE TABLE ACCESS FULL, 20090507_dates.EVENT
This query returned 5724 events.
Now, let's build a subquery that iterates over each day in the range and returns as soon as it finds a matching Monday:
SELECT COUNT(*) FROM event WHERE EXISTS ( SELECT 1 FROM dual WHERE MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 6 CONNECT BY level <= end_date - start_date + 1 )
COUNT(*) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5724 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.1129s) |
SELECT STATEMENT SORT AGGREGATE FILTER TABLE ACCESS FULL, 20090507_dates.EVENT FILTER CONNECT BY WITHOUT FILTERING FAST DUAL
Exactly same results.
The hierarchical subquery generates a rowset having as many rows as there are days in the range.
For each row in the rowset, we can get the corresponding day in the range: it will be defined by expression start_date + level - 1
. We can check this date for any condition we like.
MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 6
, for instance, checks that a date is a Monday.
Oracle optimizes EXISTS so that it returns as soon as the first value is found.
But since dual
is accessed with FAST DUAL
access method which implies no table I/O
, the loops have very low performance impact anyway.
Now, let's try to complicate the query. Let's find all ranges that have a March, 13th which has fallen on a Friday:
SELECT * FROM event WHERE EXISTS ( SELECT 1 FROM dual WHERE MOD((start_date + level - 1) - TO_DATE(1, 'J'), 7) = 3 AND TO_CHAR((start_date + level - 1), 'DD') = 13 AND TO_CHAR((start_date + level - 1), 'MM') = 9 CONNECT BY level <= end_date - start_date + 1 )
ID | START_DATE | END_DATE | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
769 | 11.09.2002 00:00:00 | 15.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
810 | 13.09.2002 00:00:00 | 16.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3967 | 12.09.2002 00:00:00 | 14.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4383 | 09.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4867 | 12.09.2002 00:00:00 | 17.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4903 | 13.09.2002 00:00:00 | 18.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5262 | 12.09.2002 00:00:00 | 14.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5300 | 12.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5552 | 12.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5974 | 11.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5981 | 13.09.2002 00:00:00 | 15.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6809 | 10.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6891 | 10.09.2002 00:00:00 | 14.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7979 | 12.09.2002 00:00:00 | 16.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8054 | 11.09.2002 00:00:00 | 14.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9298 | 11.09.2002 00:00:00 | 16.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9529 | 13.09.2002 00:00:00 | 15.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9545 | 12.09.2002 00:00:00 | 13.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9858 | 10.09.2002 00:00:00 | 15.09.2002 00:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 rows fetched in 0.0007s (0.1988s) |
SELECT STATEMENT FILTER TABLE ACCESS FULL, 20090507_dates.EVENT FILTER CONNECT BY WITHOUT FILTERING FAST DUAL
We added two more simple conditions into the WHERE
clause of the subquery, and the query works fine.
That would hardly be feasible if we were trying to construct the expression using only the range limits: this condition is way too complex.