MySQL: grouping continuous ranges
From Stack Overflow:
Given the following table:
ID State Date 12 1 2009-07-16 10:00 45 2 2009-07-16 13:00 67 2 2009-07-16 14:40 77 1 2009-07-16 15:00 89 1 2009-07-16 15:30 99 1 2009-07-16 16:00 , how can I group by the field
state
, while still maintaining the borders between the state changes?I need
MIN(id)
,MIN(date)
andCOUNT(*)
for each group, and this is expected:
ID State Date Count 12 1 2009-07-16 10:00 1 45 2 2009-07-16 13:00 2 77 1 2009-07-16 15:00 3
Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.
Let's create a sample table:
CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory; CREATE TABLE t_range ( id INT NOT NULL PRIMARY KEY, state INT NOT NULL, date DATETIME NOT NULL, value VARCHAR(100) NOT NULL, KEY ix_range_date_state (date, state) ) ENGINE=InnoDB DEFAULT CHARSET=UTF8; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_filler(10000); COMMIT; INSERT INTO t_range SELECT id, (id - 1) DIV 1000 + 1, '2009-07-24' - INTERVAL id MINUTE, RPAD(CONCAT('Value ', id), 100, ' ') FROM filler;
This table has 10,000 rows with 10 consecutive ranges, having states 1
and 2
.
Rows from 1 to 1,000 have state = 1
, rows from 1,001 to 2,000 have state = 2
, next thousand rows have state = 1
again etc.
Dates come in descending order, that is higher id
s have lower dates.
We can use a pure set-based approach, that is select the id
of the first row, ordered by date DESC
then state DESC
, which differs from the state of the current row.
This value is actually the last id
of the previous continuous range, and it will be the same for all rows in the current range.
Then we can just group on this value:
SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt FROM ( SELECT id, date, state, ( SELECT id FROM t_range ri WHERE (ri.date, ri.state, ri.id) < (ro.date, ro.state, ro.id) AND ri.state <> ro.state ORDER BY date DESC, state DESC LIMIT 1 ) AS prev FROM t_range ro ) q GROUP BY prev
id | date | state | cnt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9001 | 2009-07-17 01:20:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 2009-07-23 07:20:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1001 | 2009-07-22 14:40:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2001 | 2009-07-21 22:00:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3001 | 2009-07-21 05:20:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4001 | 2009-07-20 12:40:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5001 | 2009-07-19 20:00:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6001 | 2009-07-19 03:20:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7001 | 2009-07-18 10:40:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8001 | 2009-07-17 18:00:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0000s (62.7648s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | ALL | 10000 | 100.00 | Using temporary; Using filesort | ||||
2 | DERIVED | ro | index | ix_range_date_state | 12 | 9900 | 100.00 | Using index | ||
3 | DEPENDENT SUBQUERY | ri | index | ix_range_date_state | 12 | 1 | 990000.00 | Using where; Using index |
This runs for 61 seconds, not efficient.
It's another classical case of Schlemiel the painter's Algorithm: the further it goes from the beginning of the range, the more records it takes to the beginning of the range in a subquery.
We better keep track of the groups in a session variable.
To do this, we need to:
- Initialize two session variables: a counter and a previous state holder
- Select all results ordered by
date
thenstate
- Inrement the session variable whenever the state changes (previous state holder is not equal to the current state)
- Update the state holder
- Group by counter
And here's the query to do this:
SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt FROM ( SELECT @r := @r + (@state != state OR @state IS NULL) AS gn, @state := state AS sn, s.id, s.date, s.state FROM ( SELECT @r := 0, @state := NULL ) vars, t_range s ORDER BY date, state ) q GROUP BY gn
id | date | state | cnt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9001 | 2009-07-17 01:20:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8001 | 2009-07-17 18:00:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7001 | 2009-07-18 10:40:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6001 | 2009-07-19 03:20:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5001 | 2009-07-19 20:00:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4001 | 2009-07-20 12:40:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3001 | 2009-07-21 05:20:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2001 | 2009-07-21 22:00:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1001 | 2009-07-22 14:40:00 | 2 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 2009-07-23 07:20:00 | 1 | 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0005s (0.0335s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | ALL | 10000 | 100.00 | Using temporary; Using filesort | ||||
2 | DERIVED | <derived3> | system | 1 | 100.00 | |||||
2 | DERIVED | s | index | ix_range_date_state | 12 | 9900 | 100.00 | Using index | ||
3 | DERIVED | No tables used |
This is only 30 ms, next to instant.
can you please tell me “how to fetch continuous date from a table in mysql with between dates”
WK
24 Jun 14 at 10:46
@WK: could you please be more specific?
Quassnoi
25 Jun 14 at 10:47
Yes sure.
I have a booking table that contain the following columns:
property_id
date
status
and in this table I have the records of the a month for august like
12(property_id) 01-08-2014(date) available (status)
12(property_id) 02-08-2014(date) available (status)
12(property_id) 03-08-2014(date) available (status)
12(property_id) 04-08-2014(date) un-available (status)
12(property_id) 05-08-2014(date) available (status)
and so on with status available or un-available.
Now I want to fetch continuous date from this table with the availability & three continuous date
Please reply
WK
25 Jun 14 at 10:55
Thanks for this!! I modified it some to meet my needs. I was looking for contiguous membership periods with start and end dates. I also added an account type to the group by to show different types of accounts. Basically a state of covered ( 1 ) or not covered ( 0 ) by account type. I had to change the min(id) to just id and add it to the group by for multiple people.
create table contig_test
(
id int,
mon int,
state int,
account_type int
);
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 1 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 2 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 3 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 4 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 5 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 6 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 7 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 8 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 9 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 10 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 11 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4444 , 12 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 1 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 2 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 3 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 4 , 0 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 5 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 6 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 7 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 8 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 9 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 10 , 0 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 11 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4445 , 12 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 1 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 2 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 3 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 4 , 0 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 5 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 6 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 7 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 8 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 9 , 1 , 30 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 10 , 0 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 11 , 1 , 15 );
insert into contig_test( id, mon, state, account_type ) values ( 4446 , 12 , 1 , 15 );
SELECT id ,
MIN(mon) AS startdate ,
max(mon) as enddate,
MIN(state) AS state,
account_type,
COUNT(*) cnt
FROM
(
SELECT @r := @r + (@state != state OR @state IS NULL) AS gn,
@state := state AS sn,
s.id,
s.mon,
s.state,
s.account_type
FROM
(
SELECT @r := 0,
@state := NULL
) vars, contig_test s
ORDER BY id, mon, state
) q
where state = 1
GROUP BY id, account_type, gn
order by id, startdate;
Richard
16 Jul 15 at 17:23