Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG
In the previous articles I wrote about emulating some of the analytic functions in MySQL.
Today, I'll write about four more userful functions: FIRST_VALUE
, LAST_VALUE
, LEAD
and LAG
.
These functions also do not have aggregate analogs.
FIRST VALUE(column)
returns the value of column from the first row of the grouping set.
LAST_VALUE(column)
returns the value of column from the last row of the grouping set.
This can be illustrated by the following query:
SELECT m.*, FIRST_ROW(month) OVER (PARTITION BY season ORDER BY id), LAST_ROW(month) OVER (PARTITION BY season ORDER BY id) FROM t_month m
id | season | month | FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id) | FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id) |
---|---|---|---|---|
9 | Fall | September | September | November |
10 | Fall | October | September | November |
11 | Fall | November | September | November |
3 | Spring | March | March | May |
4 | Spring | April | March | May |
5 | Spring | May | March | May |
6 | Summer | June | June | August |
7 | Summer | July | June | August |
8 | Summer | August | June | August |
1 | Winter | January | January | December |
2 | Winter | February | January | December |
12 | Winter | December | January | December |
This query selects the first and the last month of each row's season.
In this example above, January is the first month and December is the last month of winter, as January, 2009 comes earlier than December, 2009.
In MySQL, we can emulate FIRST_ROW()
with the following construction:
SELECT *, @r AS `FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id)` FROM ( SELECT m.* FROM ( SELECT @_season = NULL ) vars, t_month m ORDER BY season, id ) mo WHERE (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := month ELSE month END IS NOT NULL) AND (@_season := season) IS NOT NULL
id | season | month | FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id) |
---|---|---|---|
9 | Fall | September | September |
10 | Fall | October | September |
11 | Fall | November | September |
3 | Spring | March | March |
4 | Spring | April | March |
5 | Spring | May | March |
6 | Summer | June | June |
7 | Summer | July | June |
8 | Summer | August | June |
1 | Winter | January | January |
2 | Winter | February | January |
12 | Winter | December | January |
In this case, we just assign the value of month
to @r
whenever the season changes.
What about LAST_ROW()
?
By analogy, we'll just assign a…
Wait. When we enter the new set, we don't yet have something to assign! We'll have it only when the set ends, but we already need to output the intermediate results.
We'd better change the ordering in the ORDER BY
clause so that id
's get selected in the descending order. In this case, FIRST_ROW()
will become LAST_ROW()
and vise versa:
SELECT *, @r AS `LAST_VALUE(month) OVER (PARTITION BY season ORDER BY id)` FROM ( SELECT m.* FROM ( SELECT @_season = NULL ) vars, t_month m ORDER BY season, id DESC ) mo WHERE (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := month ELSE month END IS NOT NULL) AND (@_season := season) IS NOT NULL
id | season | month | LAST_VALUE(month) OVER (PARTITION BY season ORDER BY id) |
---|---|---|---|
11 | Fall | November | November |
10 | Fall | October | November |
9 | Fall | September | November |
5 | Spring | May | May |
4 | Spring | April | May |
3 | Spring | March | May |
8 | Summer | August | August |
7 | Summer | July | August |
6 | Summer | June | August |
12 | Winter | December | December |
2 | Winter | February | December |
1 | Winter | January | December |
Using the same principle, we can also emulate LEAD
and LAG
.
LAG(column)
returns the value of column
from the previous row in the grouping set, or NULL
if there is no previous row.
LEAD(column)
returns the value of column
from the next row in the grouping set, or NULL
if there is no next row.
SELECT mo.id, mo.season, @r AS `LAG(month) OVER (PARTITION BY season ORDER BY id)`, (@r := month) AS month FROM ( SELECT m.* FROM ( SELECT @_season = NULL, @s := NULL ) vars, t_month m ORDER BY season, id ) mo WHERE (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := NULL ELSE NULL END IS NULL) AND (@_season := season) IS NOT NULL
id | season | LAG(month) OVER (PARTITION BY season ORDER BY id) | month |
---|---|---|---|
9 | Fall | September | |
10 | Fall | September | October |
11 | Fall | October | November |
3 | Spring | March | |
4 | Spring | March | April |
5 | Spring | April | May |
6 | Summer | June | |
7 | Summer | June | July |
8 | Summer | July | August |
1 | Winter | January | |
2 | Winter | January | February |
12 | Winter | February | December |
SELECT mo.id, mo.season, @r AS `LEAD(month) OVER (PARTITION BY season ORDER BY id)`, (@r := month) AS month FROM ( SELECT m.* FROM ( SELECT @_season = NULL, @s := NULL ) vars, t_month m ORDER BY season, id DESC ) mo WHERE (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := NULL ELSE NULL END IS NULL) AND (@_season := season) IS NOT NULL
id | season | LEAD(month) OVER (PARTITION BY season ORDER BY id) | month |
---|---|---|---|
11 | Fall | November | |
10 | Fall | November | October |
9 | Fall | October | September |
5 | Spring | May | |
4 | Spring | May | April |
3 | Spring | April | March |
8 | Summer | August | |
7 | Summer | August | July |
6 | Summer | July | June |
12 | Winter | December | |
2 | Winter | December | February |
1 | Winter | February | January |
In this query, we assign a new value of month to @r
in the SELECT clause, and we need to change the order or columns a little so that the new value is assigned only after the old value is selected.
If it is not acceptable for some reason, we'll need to wrap this query into a subquery, but I won't do it in this example for the sake of brevity.
To be continued.
Hello,
your SQL-knowledge is impressing.
In the first table class=”terminal” the fifth th should not be “FIRST_VALUE(month) …”, but “LAST_VALUE(month) …”.
Best whishes from Germany
explainextended.qej@gishpuppy.com
14 Mar 13 at 14:24
hey,
Thanks for your solution.
And I found there is a bug of your SQL that implement LAG and LEAD in MySQL of 5.6 version: the lag value is not a null value in the first line of the non-first group.
My solution is :
SELECT mo.id,
mo.season,
mo.month,
@s AS ·LAG(month) OVER(PARTITION BY season ORDER BY id)·,
(case when @partition_by_column = season or @_season is null then @s := month else @s := null end) tmp_value,
(@partition_by_column := season) partition_column
FROM (SELECT m.*
FROM (SELECT @partition_by_column = NULL, @s := NULL) vars, t_month m
ORDER BY season, id) mo
Jacky Zhang
9 Jan 15 at 09:28
SUPER SOLUTION
NAVEEN
30 May 15 at 11:55
Hello,
What does the @s accomplish in the lead and lag examples?
Hilda
3 May 17 at 23:37
What is the lead function query of the same mentioned by Jacky
Pankaj Kumar
21 Jan 20 at 14:10