EXPLAIN EXTENDED

How to create fast database queries

Aggregated hours

Comments enabled. I *really* need your comment

From Stack Overflow:

I would like to know the sum of all web services completions per hour of day.

Obviously, this can be easily done with SUM() and GROUP BY:

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 634
1 642
2 633
3 624
4 420
5 479
6 428
7 424
8 473
9 434
10 485
11 567
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

My problem is that in old sets, the web service calls in the hours [00-11] were already summed up.

The simple statement as listed above would therefore lead to

SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
hour SUM(calls)
0 6234
12 526
13 513
14 555
15 679
16 624
17 796
18 752
19 843
20 827
21 774
22 647
23 533

This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23], I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:

hour SUM(calls)
0 6243/12
1 6243/12
12 526
23 533

I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.

In this rowset, we have a row that needs to be split into 12 another rows.

Nature of SQL requires us to make a join for this to be possible.

The resulting set of two relations joined together may contain at most n × m rows. Some of these rows may be filtered out by JOIN … ON or WHERE conditions.

There is only one row that needs to be split, that's why we need a rowset with at least 12 rows in it to join with that row.

Unfortunately, MySQL does not supply a built-in function to produce a dummy rowset. That's why we'll need to construct it from 12 single SELECT statements combined together with UNION ALL.

We'll need to make the following in our query:

  1. Make a dummy rowset within an uncorrelated subquery
  2. Join sample with this rowset so that row 0 is joined with all rows the rowset (producing 12 copies of the row), and other rows are left as is. This is most easily performed with an OUTER JOIN
  3. Sum the appropriate values. Calls from 0 to 11 should be averaged, while calls from 12 to 23 should be left as is. As it's an OUTER JOIN, the rows 12 to 23 will have NULLs in columns corresponding to the dummy rowset. We will test these columns for nullness to tell the two kinds of rows apart
SELECT  COALESCE(morning.hour, sample.hour),
        SUM(CASE WHEN morning.hour IS NULL THEN calls ELSE calls / 12 END) AS calls
FROM    sample
LEFT JOIN
        (
        SELECT 0 AS hour
        UNION ALL
        SELECT 1
        UNION ALL
        SELECT 2
        UNION ALL
        SELECT 3
        UNION ALL
        SELECT 4
        UNION ALL
        SELECT 5
        UNION ALL
        SELECT 6
        UNION ALL
        SELECT 7
        UNION ALL
        SELECT 8
        UNION ALL
        SELECT 9
        UNION ALL
        SELECT 10
        UNION ALL
        SELECT 11
        ) AS morning
ON      sample.hour = 0
GROUP BY
        1
COALESCE(morning.hour, sample.hour) calls
0 519.5000
1 519.5000
2 519.5000
3 519.5000
4 519.5000
5 519.5000
6 519.5000
7 519.5000
8 519.5000
9 519.5000
10 519.5000
11 519.5000
12 526.0000
13 513.0000
14 555.0000
15 679.0000
16 624.0000
17 796.0000
18 752.0000
19 843.0000
20 827.0000
21 774.0000
22 647.0000
23 533.0000

Here we are, and no temp tables.

Written by Quassnoi

March 15th, 2009 at 11:00 pm

Posted in MySQL

Leave a Reply