Archive for February 18th, 2011
Things SQL needs: SERIES()
Recently I had to deal with several scenarios which required processing and aggregating continuous series of data.

I believe this could be best illustrated with an example:
| id | source | value |
|---|---|---|
| 1 | 1 | 10 |
| 2 | 1 | 20 |
| 3 | 2 | 15 |
| 4 | 2 | 25 |
| 5 | 1 | 45 |
| 6 | 3 | 50 |
| 7 | 3 | 35 |
| 8 | 1 | 40 |
| 9 | 1 | 10 |
The records are ordered by id, and within this order there are continuous series of records which share the same value of source. In the table above, the series are separated by thick lines.
We want to calculate some aggregates across each of the series: MIN, MAX, SUM, AVG, whatever:
| source | min | max | sum | avg |
|---|---|---|---|---|
| 1 | 10 | 20 | 30 | 15.00 |
| 2 | 15 | 25 | 40 | 20.00 |
| 1 | 45 | 45 | 45 | 45.00 |
| 3 | 35 | 50 | 85 | 42.50 |
| 1 | 10 | 40 | 50 | 25.00 |
This can be used for different things. I used that for:
- Reading sensors from a moving elevator (thus tracking its position)
- Recording user's activity on a site
- Tracking the primary node in a server cluster
, but almost any seasoned database developer can recall a need for such a query.
As you can see, the values of source are repeating so a mere GROUP BY won't work here.
In the systems supporting window functions there is a workaround for that:
Subscribe in a reader