EXPLAIN EXTENDED

How to create fast database queries

MySQL: grouping continuous ranges

with 4 comments

From Stack Overflow:

Given the following table:

ID State Date
12 1 2009-07-16 10:00
45 2 2009-07-16 13:00
67 2 2009-07-16 14:40
77 1 2009-07-16 15:00
89 1 2009-07-16 15:30
99 1 2009-07-16 16:00

, how can I group by the field state, while still maintaining the borders between the state changes?

I need MIN(id), MIN(date) and COUNT(*) for each group, and this is expected:

ID State Date Count
12 1 2009-07-16 10:00 1
45 2 2009-07-16 13:00 2
77 1 2009-07-16 15:00 3

Though a pure set-based solution exists, in MySQL, this can be easily done using session variables, which is more efficient.

Let's create a sample table:

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_range (
id INT NOT NULL PRIMARY KEY,
state INT NOT NULL,
date DATETIME NOT NULL,
value VARCHAR(100) NOT NULL,
KEY ix_range_date_state (date, state)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(10000);
COMMIT;

INSERT
INTO    t_range
SELECT  id,
        (id - 1) DIV 1000 + 1,
        '2009-07-24' - INTERVAL id MINUTE,
        RPAD(CONCAT('Value ', id), 100, ' ')
FROM    filler;

This table has 10,000 rows with 10 consecutive ranges, having states 1 and 2.

Rows from 1 to 1,000 have state = 1, rows from 1,001 to 2,000 have state = 2, next thousand rows have state = 1 again etc.

Dates come in descending order, that is higher ids have lower dates.

We can use a pure set-based approach, that is select the id of the first row, ordered by date DESC then state DESC, which differs from the state of the current row.

This value is actually the last id of the previous continuous range, and it will be the same for all rows in the current range.

Then we can just group on this value:

SELECT  MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt
FROM    (
SELECT  id, date, state,
(
SELECT  id
FROM    t_range ri
WHERE   (ri.date, ri.state, ri.id) < (ro.date, ro.state, ro.id)
                        AND ri.state <> ro.state
ORDER BY
date DESC, state DESC
LIMIT 1
) AS prev
FROM    t_range ro
) q
GROUP BY
prev
id date state cnt
9001 2009-07-17 01:20:00 2 1000
1 2009-07-23 07:20:00 1 1000
1001 2009-07-22 14:40:00 2 1000
2001 2009-07-21 22:00:00 1 1000
3001 2009-07-21 05:20:00 2 1000
4001 2009-07-20 12:40:00 1 1000
5001 2009-07-19 20:00:00 2 1000
6001 2009-07-19 03:20:00 1 1000
7001 2009-07-18 10:40:00 2 1000
8001 2009-07-17 18:00:00 1 1000
10 rows fetched in 0.0000s (62.7648s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10000 100.00 Using temporary; Using filesort
2 DERIVED ro index ix_range_date_state 12 9900 100.00 Using index
3 DEPENDENT SUBQUERY ri index ix_range_date_state 12 1 990000.00 Using where; Using index

This runs for 61 seconds, not efficient.

It's another classical case of Schlemiel the painter's Algorithm: the further it goes from the beginning of the range, the more records it takes to the beginning of the range in a subquery.

We better keep track of the groups in a session variable.

To do this, we need to:

  • Initialize two session variables: a counter and a previous state holder
  • Select all results ordered by date then state
  • Inrement the session variable whenever the state changes (previous state holder is not equal to the current state)
  • Update the state holder
  • Group by counter

And here's the query to do this:

SELECT  MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt
FROM    (
SELECT  @r := @r + (@state != state OR @state IS NULL) AS gn,
@state := state AS sn,
s.id, s.date, s.state
FROM    (
SELECT  @r := 0,
@state := NULL
) vars,
t_range s
ORDER BY
date, state
) q
GROUP BY
gn

id date state cnt
9001 2009-07-17 01:20:00 2 1000
8001 2009-07-17 18:00:00 1 1000
7001 2009-07-18 10:40:00 2 1000
6001 2009-07-19 03:20:00 1 1000
5001 2009-07-19 20:00:00 2 1000
4001 2009-07-20 12:40:00 1 1000
3001 2009-07-21 05:20:00 2 1000
2001 2009-07-21 22:00:00 1 1000
1001 2009-07-22 14:40:00 2 1000
1 2009-07-23 07:20:00 1 1000
10 rows fetched in 0.0005s (0.0335s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10000 100.00 Using temporary; Using filesort
2 DERIVED <derived3> system 1 100.00
2 DERIVED s index ix_range_date_state 12 9900 100.00 Using index
3 DERIVED No tables used

This is only 30 ms, next to instant.

Written by Quassnoi

July 24th, 2009 at 11:00 pm

Posted in MySQL

4 Responses to 'MySQL: grouping continuous ranges'

Subscribe to comments with RSS

  1. can you please tell me “how to fetch continuous date from a table in mysql with between dates”

    WK

    24 Jun 14 at 10:46

  2. @WK: could you please be more specific?

    Quassnoi

    25 Jun 14 at 10:47

  3. Yes sure.

    I have a booking table that contain the following columns:

    property_id
    date
    status

    and in this table I have the records of the a month for august like

    12(property_id) 01-08-2014(date) available (status)
    12(property_id) 02-08-2014(date) available (status)
    12(property_id) 03-08-2014(date) available (status)
    12(property_id) 04-08-2014(date) un-available (status)

    12(property_id) 05-08-2014(date) available (status)

    and so on with status available or un-available.

    Now I want to fetch continuous date from this table with the availability & three continuous date

    Please reply

    WK

    25 Jun 14 at 10:55

  4. Thanks for this!! I modified it some to meet my needs. I was looking for contiguous membership periods with start and end dates. I also added an account type to the group by to show different types of accounts. Basically a state of covered ( 1 ) or not covered ( 0 ) by account type. I had to change the min(id) to just id and add it to the group by for multiple people.


    create table contig_test
    (
    id int,
    mon int,
    state int,
    account_type int
    );

    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 1 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 2 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 3 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 4 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 5 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 6 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 7 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 8 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 9 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 10 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 11 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4444 , 12 , 1 , 15 );

    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 1 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 2 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 3 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 4 , 0 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 5 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 6 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 7 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 8 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 9 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 10 , 0 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 11 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4445 , 12 , 1 , 15 );

    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 1 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 2 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 3 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 4 , 0 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 5 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 6 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 7 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 8 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 9 , 1 , 30 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 10 , 0 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 11 , 1 , 15 );
    insert into contig_test( id, mon, state, account_type ) values ( 4446 , 12 , 1 , 15 );

    SELECT id ,
    MIN(mon) AS startdate ,
    max(mon) as enddate,
    MIN(state) AS state,
    account_type,
    COUNT(*) cnt
    FROM
    (
    SELECT @r := @r + (@state != state OR @state IS NULL) AS gn,
    @state := state AS sn,
    s.id,
    s.mon,
    s.state,
    s.account_type
    FROM
    (
    SELECT @r := 0,
    @state := NULL
    ) vars, contig_test s
    ORDER BY id, mon, state
    ) q
    where state = 1
    GROUP BY id, account_type, gn
    order by id, startdate;

    Richard

    16 Jul 15 at 17:23

Leave a Reply