Oracle: generating a list of dates and counting ranges for each date
From Stack Overflow:
I have a table with data such as below:
Group Start Date End Date A 2001.01.01 2001.01.03 A 2001.01.01 2001.01.02 A 2001.01.03 2001.01.03 B 2001.01.01 2001.01.01 I am looking to produce a view that gives a count for each day
We have a list of ranges here, and for each date we should count the number of ranges that contain this date.
To make this query, we will employ one simple fact: the number of ranges containing a given date is the number of ranges started on or before this date minus the number of ranges that ended before this date.
This can easily be calculated using window functions.
Let's create a sample table:
BEGIN DBMS_RANDOM.seed(20090909); END; / CREATE TABLE t_range ( id NOT NULL PRIMARY KEY, grouper NOT NULL, start_date NOT NULL, end_date NOT NULL ) AS SELECT level, MOD((level - 1), 10) + 1, TO_DATE('09.09.2009', 'dd.mm.yyyy') - TRUNC((level - 1) / 14400), TO_DATE('09.09.2009', 'dd.mm.yyyy') - TRUNC((level - 1) / 14400) + TRUNC(DBMS_RANDOM.value * 10) + 1 FROM dual CONNECT BY level <= 1000000 / CREATE INDEX ix_range_grouper_startdate ON t_range (grouper, start_date) / CREATE INDEX ix_range_grouper_enddate ON t_range (grouper, end_date) / [/sourcecode] </div> This table contains <strong>1,000,000</strong> rows in <strong>10</strong> groups. Ranges start every <strong>10</strong> minutes (<strong>144</strong> per day) and last from <strong>1</strong> to <strong>11</strong> days. Range lengths are random. To generate the count of ranges in each group that contain each date, we need to do the following: <ol> <li>Generate a list of dates, from minimal to maximal. This can be easily done using a <code>CONNECT BY</code> query on <code>dual</code> table: SELECT cur_date FROM ( SELECT ( SELECT MIN(start_date) FROM t_range ) + level - 1 AS cur_date FROM dual CONNECT BY level <= ( SELECT MAX(end_date) FROM t_range ) - ( SELECT MIN(start_date) FROM t_range ) + 1 ) dates [/sourcecode] <div class="terminal"> <table class="terminal"> <tr> <th>CUR_DATE</th></tr> <tr><td class="date">02.07.2009 00:00:00</td></tr> <tr><td class="date">03.07.2009 00:00:00</td></tr> <tr><td class="date">04.07.2009 00:00:00</td></tr> <tr class="break"><td colspan="100"></td></tr> <tr><td class="date">18.09.2009 00:00:00</td></tr> <tr><td class="date">19.09.2009 00:00:00</td></tr> <tr class="statusbar"><td colspan="100">80 rows fetched in 0.0017s (0.0011s)</td></tr> </table> </div> Since <code>MIN</code> and <code>MAX</code> are instant on indexed fields, this query is instant too.</li> <li> Build a rowset that would contain a number of ranges starting on each given date. We can use a simple <code>GROUP BY</code> to do this, since missing dates will be handled by a <code>LEFT JOIN</code> with the list of generated dates that we built on the previous step: SELECT grouper AS sgrp, start_date, COUNT(*) AS scnt FROM t_range GROUP BY grouper, start_date ORDER BY grouper, start_date
SGRP | START_DATE | SCNT | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 02.07.2009 00:00:00 | 640 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 03.07.2009 00:00:00 | 1440 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 04.07.2009 00:00:00 | 1440 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 08.09.2009 00:00:00 | 1440 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 09.09.2009 00:00:00 | 1440 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
700 rows fetched in 0.0247s (0.3159s) |
We see that within group 1, 640 ranges started on 02.07.2009, 1,400 ranges started on 03.07.2009 etc.
end_dates
:
SELECT grouper AS egrp, end_date, COUNT(*) AS ecnt FROM t_range GROUP BY grouper, end_date ORDER BY grouper, end_date
EGRP | END_DATE | ECNT | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 03.07.2009 00:00:00 | 56 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 04.07.2009 00:00:00 | 197 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 05.07.2009 00:00:00 | 359 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 18.09.2009 00:00:00 | 287 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | 19.09.2009 00:00:00 | 139 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
790 rows fetched in 0.0278s (0.6250s) |
Within group 1, 56 ranges ended on 03.07.2009, 197 ones ended on 04.07.2009 etc.
- Before and on Jul 2nd, 640 ranges started. 0 ranges ended before that date. This means that this day is contained by 640 ranges.
- Before and on Jul 3rd, 2,080 ranges started. This includes 1,400 ranges that started on that date and 640 ranges that started before. No ranges ended before this date too. The date is contained by 2,080 ranges.
- Before and on Jul 4th, 3,520 ranges started. This includes 1,400 ranges that started on that date and 2,080 ranges that started before that date. 56 ranges ended before that date. This date is therefore contained by 3,462 ranges.
Et cetera. Now we see that to calculate the number of ranges contained by any given date we should OUTER JOIN
the resultsets containing the counts, and calculate partial sums of these counts. Difference between these sums will be the result we're after.
And here's the final query:
SELECT cur_date,
grouper,
SUM(COALESCE(scnt, 0) - COALESCE(ecnt, 0)) OVER (PARTITION BY grouper ORDER BY cur_date) AS ranges
FROM (
SELECT (
SELECT MIN(start_date)
FROM t_range
) + level - 1 AS cur_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(end_date)
FROM t_range
) -
(
SELECT MIN(start_date)
FROM t_range
) + 1
) dates
CROSS JOIN
(
SELECT DISTINCT grouper AS grouper
FROM t_range
) groups
LEFT JOIN
(
SELECT grouper AS sgrp, start_date, COUNT(*) AS scnt
FROM t_range
GROUP BY
grouper, start_date
) starts
ON sgrp = grouper
AND start_date = cur_date
LEFT JOIN
(
SELECT grouper AS egrp, end_date, COUNT(*) AS ecnt
FROM t_range
GROUP BY
grouper, end_date
) ends
ON egrp = grouper
AND end_date = cur_date - 1
ORDER BY
grouper, cur_date
[/sourcecode]
CUR_DATE | GROUPER | RANGES | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
02.07.2009 00:00:00 | 1 | 640 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
03.07.2009 00:00:00 | 1 | 2080 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
04.07.2009 00:00:00 | 1 | 3464 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18.09.2009 00:00:00 | 10 | 426 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19.09.2009 00:00:00 | 10 | 139 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
800 rows fetched in 0.0296s (1.0937s) |
The results are just like we expected.
Note the JOIN
condition end_date = cur_date - 1
. We subtract a day to make sure that we select events that ended strictly before the date. The events that ended on the date in question still contain it and should contribute into the count.
This query is very fast (1,000,000 rows are processed in just a trifle more than a second).
Written by Quassnoi
September 9th, 2009 at 11:00 pm
Posted in Oracle
10 Responses to 'Oracle: generating a list of dates and counting ranges for each date'
Subscribe to comments with RSS
Leave a Reply
Subscribe
Subscribe by email
Contacts
Should I?
Yes. Feel free to ask questions and write me. An interesting question is a pleasure to answer, and I really enjoy receiving feedback
Recent articles
- Happy New Year: Diffusion Model image generator in about 700 lines of pure SQL
- Happy New Year: GPT in 500 lines of SQL
- Happy New Year: solving the Rubik’s Cube in SQL
- A good first word for Wordle
- Happy New Year: quantum computer emulator in SQL
Calendar
Archives
- December 2024
- December 2023
- December 2022
- January 2022
- December 2021
- December 2020
- December 2019
- December 2018
- December 2017
- December 2016
- December 2015
- December 2014
- July 2014
- December 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- March 2013
- January 2013
- December 2012
- December 2011
- June 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
Thanks for this comprehensive article. However, when I test your SQL with
INWORK 01/FEB/13 02/FEB/13
INWORK 01/FEB/13 02/FEB/13
RELEASE 04/FEB/13 06/FEB/13
REVIEW 02/FEB/13 04/FEB/13
I get
01/FEB/13 INWORK 1
02/FEB/13 INWORK 1
03/FEB/13 INWORK -1
04/FEB/13 INWORK -1
05/FEB/13 INWORK -1
06/FEB/13 INWORK -1
01/FEB/13 RELEASE 0
02/FEB/13 RELEASE 0
03/FEB/13 RELEASE 0
04/FEB/13 RELEASE 1
05/FEB/13 RELEASE 1
06/FEB/13 RELEASE 1
01/FEB/13 REVIEW 0
02/FEB/13 REVIEW 1
03/FEB/13 REVIEW 1
04/FEB/13 REVIEW 1
05/FEB/13 REVIEW 0
06/FEB/13 REVIEW 0
Which does not seem correct ie : how can there be negative counts.
Any suggestions on what I am doing wrong?
Richard
20 Feb 13 at 06:20
@Richard: please post your sample data and the query on http://sqlfiddle.com and post here the link.
Quassnoi
20 Feb 13 at 12:06
Quassnoi – Thanks for the assistance. I have posted the code at http://sqlfiddle.com/#!4/e6bf9/1/0
As background, the data represents the approval work flow of two objects. One object that goes from INWORK to REVIEW to RELEASE during the dates between 01FEB13 to 08FEB13; the second object is INWORK from 01FEB13 to 08FEB13.
The output I require is, for end (23:59:59) of each day, the number of objects in the INWORK state, the number in REVIEW state and the number in RELEASE state.
The problem I have is that the output for 02FEB13, on summing the counts has a total of 3. It should be 2. Similarly for 04FEB13, the total is 3, not 2.
If I add the HH:MI:SS values to further descretize the intervals, the output is more incorrect. see http://sqlfiddle.com/#!4/34aaa/1/0
Any suggestions on how to ensure the counts equal 2 for each day I count?
Richard
20 Feb 13 at 13:26
@Richard: I don’t see the number 3 in the output at all here: http://sqlfiddle.com/#!4/e6bf9/1/0
You have 2 inwork tasks between Feb 1 and Feb 2; and 1 inwork task for all other dates. The other counts also seem legit to me.
Quassnoi
20 Feb 13 at 13:34
Thanks for your response – the total for 02FEB13 is 2 for INWORK, and 1 for REVIEW, total of 3, not 2. This is the problem :-)
Richard
20 Feb 13 at 13:56
@Richard: if you need the totals not regarding the task state, just omit the whole “grouper” stuff. See here: http://sqlfiddle.com/#!4/d1ef5/5
Please note that you have overlapping dates for some of your states so those states will be counted twice. You model does not allow distinguishing the tasks since you don’t keep the task id in your table.
Quassnoi
20 Feb 13 at 14:00
As you can see from my second SQL example, if I include hh:mm:ss then there is not overlap of the work flow increments.
Also, the next part of this problem is that I use the output as SQL for a stacked bar chart in iReport, thus the grouper field is used as the series in the stacked bar chart. So the total of the bar charts should be 2. Note also that I am reporting at 23:59:59 of each day. Thus for the 02 FEB13 I should not count the INWORK task that is completed.
Richard
20 Feb 13 at 14:12
@Richard: as I said, there is no way to distinguish the tasks. Add task id to your model.
Quassnoi
20 Feb 13 at 14:49
Quassnoi – as per the sample @ http://sqlfiddle.com/#!4/d1ef5/5 the inclusion of a unique identifier for each work flow does not solve the problem. I still see 3 counts for 02FB13 and 04FEB13. There should only be 2.
Richard
21 Feb 13 at 01:20
Hi @quassnoi I found this article similar with my problem,
can you please also help me here https://stackoverflow.com/questions/59319542/count-data-by-loop-calendar-sql-oracle
thank you
Dede
13 Dec 19 at 13:28