Archive for February 18th, 2011
Things SQL needs: SERIES()
Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.
I believe this could be best illustrated with an example:
id | source | value |
---|---|---|
1 | 1 | 10 |
2 | 1 | 20 |
3 | 2 | 15 |
4 | 2 | 25 |
5 | 1 | 45 |
6 | 3 | 50 |
7 | 3 | 35 |
8 | 1 | 40 |
9 | 1 | 10 |
The records are ordered by id
, and within this order there are continuous series of records which share the same value of source
. In the table above, the series are separated by thick lines.
We want to calculate some aggregates across each of the series: MIN
, MAX
, SUM
, AVG
, whatever:
source | min | max | sum | avg |
---|---|---|---|---|
1 | 10 | 20 | 30 | 15.00 |
2 | 15 | 25 | 40 | 20.00 |
1 | 45 | 45 | 45 | 45.00 |
3 | 35 | 50 | 85 | 42.50 |
1 | 10 | 40 | 50 | 25.00 |
This can be used for different things. I used that for:
- Reading sensors from a moving elevator (thus tracking its position)
- Recording user's activity on a site
- Tracking the primary node in a server cluster
, but almost any seasoned database developer can recall a need for such a query.
As you can see, the values of source
are repeating so a mere GROUP BY
won't work here.
In the systems supporting window functions there is a workaround for that: