Overlapping ranges: PostgreSQL
Comments enabled. I *really* need your comment
Continuing the theme on overlapping ranges:
From Stack Overflow:
I have an event table that specifies a date range with
start_date
andend_date
fields.I have another date range, specified in code, that defines the current week as
week_start
andweek_end
.I'd like to query all events for the week.
Finally, PostgreSQL.
A quick reminder: we have two options to write a condition for such a query.
A more simple one:
start_date < @week_end AND end_date > @week_start
is, uh, more simple, but less efficient in SQL Server and MySQL.
A more complex one:
(start_date > @week_start AND start_date < @week_end)
OR
(@week_start BETWEEN start_date AND end_date)
is more index friendly and is more performant in SQL Server, Oracle and MySQL (with a slight schema change to enable SPATIAL index support)
Now, let's create a sample table in PostgreSQL:
CREATE TABLE t_event ( id INT NOT NULL PRIMARY KEY, evt_start TIMESTAMP NOT NULL, evt_end TIMESTAMP NOT NULL, evt_value VARCHAR(400) NOT NULL ); CREATE INDEX ix_event_start ON t_event (evt_start); CREATE INDEX ix_event_end_start ON t_event (evt_end, evt_start); SELECT setseed(20090703); INSERT INTO t_event SELECT s, '2009-07-03'::DATE - (s || ' MINUTE')::INTERVAL, '2009-07-03'::DATE - (s || ' MINUTE')::INTERVAL + (FLOOR(RANDOM() * 35) + 1 || ' DAY')::INTERVAL, RPAD('Value ' || s, 400, ' ') FROM generate_series (1, 1000000) s; ANALYZE t_event;
PostgreSQL also supports ANSI OVERLAPS
operator, and it's even documented (unlike Oracle
's one).
Let's use it:
SELECT SUM(LENGTH(evt_value)) FROM t_event WHERE (evt_start, evt_end) OVERLAPS ('2009-06-21'::TIMESTAMP, '2009-06-21'::TIMESTAMP + '7 DAY'::INTERVAL)
sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14444400 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (1.1719s) |
Aggregate (cost=72157.44..72157.46 rows=1 width=404) -> Seq Scan on t_event (cost=0.00..71324.10 rows=333336 width=404) Filter: "overlaps"(evt_start, evt_end, '2009-06-21 00:00:00'::timestamp without time zone, '2009-06-28 00:00:00'::timestamp without time zone)
As in Oracle, it's not optimizable, and uses Seq Scan
. Hence, it takes more than 1.17
second to complete the query.
Now, the simple condition:
SELECT SUM(LENGTH(evt_value)) FROM t_event WHERE evt_end >= '2009-05-31'::TIMESTAMP AND evt_start <= '2009-05-31'::TIMESTAMP + '7 DAYS'::INTERVAL
sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14450800 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.1374s) |
Aggregate (cost=65260.10..65260.11 rows=1 width=404) -> Bitmap Heap Scan on t_event (cost=2172.84..65083.55 rows=70620 width=404) Recheck Cond: ((evt_end >= '2009-05-31 00:00:00'::timestamp without time zone) AND (evt_start <= '2009-06-07 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on ix_event_end_start (cost=0.00..2155.18 rows=70620 width=0) Index Cond: ((evt_end >= '2009-05-31 00:00:00'::timestamp without time zone) AND (evt_start <= '2009-06-07 00:00:00'::timestamp without time zone))
This one is much nicer: only 0.14 seconds.
This query uses the composite index on (evt_end, evt_start)
to build a bitmap: a compact in-memory representation of matching values which corresponds each tid
(tuple identifier) into exactly one bit in memory.
The bitmap then is scanned sequentially and whenever a 1 bit is found, the tuple is fetched.
This is more efficient than fetching tuples in a nested loop, since sequential access is faster than random.
However, the index is ranged only by evt_end
.
Now let's try the complex condition:
SELECT SUM(LENGTH(evt_value)) FROM t_event WHERE evt_start > '2009-05-31'::TIMESTAMP AND evt_start <= '2009-05-31'::TIMESTAMP + '7 DAYS'::INTERVAL OR ('2009-05-31'::TIMESTAMP BETWEEN evt_start AND evt_end)
sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14450800 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 row fetched in 0.0001s (0.1254s) |
Aggregate (cost=66192.63..66192.65 rows=1 width=404) -> Bitmap Heap Scan on t_event (cost=2458.15..65996.52 rows=78445 width=404) Recheck Cond: (((evt_start > '2009-05-31 00:00:00'::timestamp without time zone) AND (evt_start <= '2009-06-07 00:00:00'::timestamp without time zone)) OR (('2009-05-31 00:00:00'::timestamp without time zone <= evt_end) AND ('2009-05-31 00:00:00'::timestamp without time zone >= evt_start))) -> BitmapOr (cost=2458.15..2458.15 rows=79084 width=0) -> Bitmap Index Scan on ix_event_start (cost=0.00..263.75 rows=9131 width=0) Index Cond: ((evt_start > '2009-05-31 00:00:00'::timestamp without time zone) AND (evt_start <= '2009-06-07 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on ix_event_end_start (cost=0.00..2155.18 rows=69953 width=0) Index Cond: (('2009-05-31 00:00:00'::timestamp without time zone <= evt_end) AND ('2009-05-31 00:00:00'::timestamp without time zone >= evt_start))
This uses the same technique with two passes: the first pass fills the bitmap with the values that correspond to the first condition (served by an index on evt_start
), the second one fills the same bitmap with the values from the second condition, OR
'ing the bits if necessary.
This is even more efficient, since computers nowadays are especially good in performing OR
operations on preallocated memory.