Oracle: generating a list of dates and counting ranges for each date
From Stack Overflow:
I have a table with data such as below:
Group Start Date End Date A 2001.01.01 2001.01.03 A 2001.01.01 2001.01.02 A 2001.01.03 2001.01.03 B 2001.01.01 2001.01.01 I am looking to produce a view that gives a count for each day
We have a list of ranges here, and for each date we should count the number of ranges that contain this date.
To make this query, we will employ one simple fact: the number of ranges containing a given date is the number of ranges started on or before this date minus the number of ranges that ended before this date.
This can easily be calculated using window functions.
Let’s create a sample table:
This table contains 1,000,000 rows in 10 groups.
Ranges start every 10 minutes (144 per day) and last from 1 to 11 days. Range lengths are random.
To generate the count of ranges in each group that contain each date, we need to do the following:
- Generate a list of dates, from minimal to maximal. This can be easily done using a
CONNECT BYquery ondualtable:SELECT cur_date FROM ( SELECT ( SELECT MIN(start_date) FROM t_range ) + level - 1 AS cur_date FROM dual CONNECT BY level <= ( SELECT MAX(end_date) FROM t_range ) - ( SELECT MIN(start_date) FROM t_range ) + 1 ) datesCUR_DATE 02.07.2009 00:00:00 03.07.2009 00:00:00 04.07.2009 00:00:00 18.09.2009 00:00:00 19.09.2009 00:00:00 80 rows fetched in 0.0017s (0.0011s) Since
MINandMAXare instant on indexed fields, this query is instant too. -
Build a rowset that would contain a number of ranges starting on each given date. We can use a simple
GROUP BYto do this, since missing dates will be handled by aLEFT JOINwith the list of generated dates that we built on the previous step:SELECT grouper AS sgrp, start_date, COUNT(*) AS scnt FROM t_range GROUP BY grouper, start_date ORDER BY grouper, start_dateSGRP START_DATE SCNT 1 02.07.2009 00:00:00 640 1 03.07.2009 00:00:00 1440 1 04.07.2009 00:00:00 1440 10 08.09.2009 00:00:00 1440 10 09.09.2009 00:00:00 1440 700 rows fetched in 0.0247s (0.3159s) We see that within group 1, 640 ranges started on 02.07.2009, 1,400 ranges started on 03.07.2009 etc.
-
Do the same with
end_dates:SELECT grouper AS egrp, end_date, COUNT(*) AS ecnt FROM t_range GROUP BY grouper, end_date ORDER BY grouper, end_dateEGRP END_DATE ECNT 1 03.07.2009 00:00:00 56 1 04.07.2009 00:00:00 197 1 05.07.2009 00:00:00 359 10 18.09.2009 00:00:00 287 10 19.09.2009 00:00:00 139 790 rows fetched in 0.0278s (0.6250s) Within group 1, 56 ranges ended on 03.07.2009, 197 ones ended on 04.07.2009 etc.
-
Finally, we should calculate the number of ranges within each group containing each date:
-
Before and on Jul 2nd, 640 ranges started. 0 ranges ended before that date. This means that this day is contained by 640 ranges.
-
Before and on Jul 3rd, 2,080 ranges started. This includes 1,400 ranges that started on that date and 640 ranges that started before. No ranges ended before this date too. The date is contained by 2,080 ranges.
-
Before and on Jul 4th, 3,520 ranges started. This includes 1,400 ranges that started on that date and 2,080 ranges that started before that date. 56 ranges ended before that date. This date is therefore contained by 3,462 ranges.
Et cetera. Now we see that to calculate the number of ranges contained by any given date we should
OUTER JOINthe resultsets containing the counts, and calculate partial sums of these counts. Difference between these sums will be the result we’re after. -
And here’s the final query:
SELECT cur_date,
grouper,
SUM(COALESCE(scnt, 0) - COALESCE(ecnt, 0)) OVER (PARTITION BY grouper ORDER BY cur_date) AS ranges
FROM (
SELECT (
SELECT MIN(start_date)
FROM t_range
) + level - 1 AS cur_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(end_date)
FROM t_range
) -
(
SELECT MIN(start_date)
FROM t_range
) + 1
) dates
CROSS JOIN
(
SELECT DISTINCT grouper AS grouper
FROM t_range
) groups
LEFT JOIN
(
SELECT grouper AS sgrp, start_date, COUNT(*) AS scnt
FROM t_range
GROUP BY
grouper, start_date
) starts
ON sgrp = grouper
AND start_date = cur_date
LEFT JOIN
(
SELECT grouper AS egrp, end_date, COUNT(*) AS ecnt
FROM t_range
GROUP BY
grouper, end_date
) ends
ON egrp = grouper
AND end_date = cur_date - 1
ORDER BY
grouper, cur_date
| CUR_DATE | GROUPER | RANGES | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 02.07.2009 00:00:00 | 1 | 640 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 03.07.2009 00:00:00 | 1 | 2080 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 04.07.2009 00:00:00 | 1 | 3464 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18.09.2009 00:00:00 | 10 | 426 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19.09.2009 00:00:00 | 10 | 139 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 800 rows fetched in 0.0296s (1.0937s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The results are just like we expected.
Note the JOIN condition end_date = cur_date - 1. We subtract a day to make sure that we select events that ended strictly before the date. The events that ended on the date in question still contain it and should contribute into the count.
This query is very fast (1,000,000 rows are processed in just a trifle more than a second).
Subscribe in a reader