Archive for May 7th, 2009
Checking event dates
Comments enabled. I *really* need your comment
From Stack Overflow:
Suppose the following table structure in Oracle:
CREATE TABLE event ( id INTEGER, start_date DATE, end_date DATE )Is there a way to query all of the events that fall on a particular day of the week?
For example, I would like to find a query that would find every event that falls on a Monday.
Figuring out if the
start_date
orend_date
falls on a Monday is easy, but I'm not sure how to find it out for the dates between.
This is one of the range predicates which are very unfriendly to plain B-Tree
indexes.
But even if there would be a range friendly index (like R-Tree
), that would hardly be an improvement. Monday's make up 14.3% of all days, that means that an index if there were any, would have very low selectivity even for one-day intervals.
And if the majority of intervals last for more than one day, the selectivity of the condition yet decreases: 86% of 6-day intervals have a Monday inside.
Given the drawbacks of index scanning and joining on ROWID, we can say that a FULL TABLE SCAN
will be a nice access path for this query, and we just need to represent it as an SQL condition (without bothering for its sargability)
We could check that a Monday is between end_date
's day-of-week number and the range length's offset from this number:
SELECT * FROM "20090507_dates".event WHERE 6 BETWEEN MOD(start_date - TO_DATE(1, 'J'), 7) AND MOD(start_date - TO_DATE(1, 'J'), 7) + end_date - start_date
This query converts each ranges into a pair of zero-based, Tuesday-based day of week offsets, and returns all records which have day 6 (a Monday) inside the range.
Note that we don't use Oracle's TO_DATE('D')
function here: starting day of week depends on NLS_TERRITORY
which only leads to confusion.
Now, this query works but looks quite ugly. And if we will check for more complex conditions, it will become even uglier.
What if we need to find all ranges that contain a Friday, 13th? Or a second week's Thursday? The conditions will become unreadable and unmaintainable.
Can we do it in some more elegant way?
What if we just iterate over the days of the range and check each day for the condition? This should be much more simple than inventing the boundaries.
Let's create a sample table and try it:
Read the rest of this entry »