EXPLAIN EXTENDED

How to create fast database queries

Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG

with 5 comments

In the previous articles I wrote about emulating some of the analytic functions in MySQL.

Today, I'll write about four more userful functions: FIRST_VALUE, LAST_VALUE, LEAD and LAG.

These functions also do not have aggregate analogs.

FIRST VALUE(column) returns the value of column from the first row of the grouping set.

LAST_VALUE(column) returns the value of column from the last row of the grouping set.

This can be illustrated by the following query:

SELECT  m.*,
FIRST_ROW(month) OVER (PARTITION BY season ORDER BY id),
LAST_ROW(month) OVER (PARTITION BY season ORDER BY id)
FROM    t_month m
id season month FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id) FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id)
9 Fall September September November
10 Fall October September November
11 Fall November September November
3 Spring March March May
4 Spring April March May
5 Spring May March May
6 Summer June June August
7 Summer July June August
8 Summer August June August
1 Winter January January December
2 Winter February January December
12 Winter December January December

This query selects the first and the last month of each row's season.

In this example above, January is the first month and December is the last month of winter, as January, 2009 comes earlier than December, 2009.

In MySQL, we can emulate FIRST_ROW() with the following construction:

SELECT  *,
@r AS `FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id)`
FROM    (
SELECT  m.*
FROM    (
SELECT  @_season = NULL
) vars,
t_month m
ORDER BY
season, id
) mo
WHERE   (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := month ELSE month END IS NOT NULL)
AND (@_season := season) IS NOT NULL
id season month FIRST_VALUE(month) OVER (PARTITION BY season ORDER BY id)
9 Fall September September
10 Fall October September
11 Fall November September
3 Spring March March
4 Spring April March
5 Spring May March
6 Summer June June
7 Summer July June
8 Summer August June
1 Winter January January
2 Winter February January
12 Winter December January

In this case, we just assign the value of month to @r whenever the season changes.

What about LAST_ROW()?

By analogy, we'll just assign a…

Wait. When we enter the new set, we don't yet have something to assign! We'll have it only when the set ends, but we already need to output the intermediate results.

We'd better change the ordering in the ORDER BY clause so that id's get selected in the descending order. In this case, FIRST_ROW() will become LAST_ROW() and vise versa:

SELECT  *,
@r AS `LAST_VALUE(month) OVER (PARTITION BY season ORDER BY id)`
FROM    (
SELECT  m.*
FROM    (
SELECT  @_season = NULL
) vars,
t_month m
ORDER BY
season, id DESC
) mo
WHERE   (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := month ELSE month END IS NOT NULL)
AND (@_season := season) IS NOT NULL
id season month LAST_VALUE(month) OVER (PARTITION BY season ORDER BY id)
11 Fall November November
10 Fall October November
9 Fall September November
5 Spring May May
4 Spring April May
3 Spring March May
8 Summer August August
7 Summer July August
6 Summer June August
12 Winter December December
2 Winter February December
1 Winter January December

Using the same principle, we can also emulate LEAD and LAG.

LAG(column) returns the value of column from the previous row in the grouping set, or NULL if there is no previous row.
LEAD(column) returns the value of column from the next row in the grouping set, or NULL if there is no next row.

SELECT  mo.id, mo.season,
@r AS `LAG(month) OVER (PARTITION BY season ORDER BY id)`,
(@r := month) AS month
FROM    (
SELECT  m.*
FROM    (
SELECT  @_season = NULL,
@s := NULL
) vars,
t_month m
ORDER BY
season, id
) mo
WHERE  (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := NULL ELSE NULL END IS NULL)
AND (@_season := season) IS NOT NULL
id season LAG(month) OVER (PARTITION BY season ORDER BY id) month
9 Fall September
10 Fall September October
11 Fall October November
3 Spring March
4 Spring March April
5 Spring April May
6 Summer June
7 Summer June July
8 Summer July August
1 Winter January
2 Winter January February
12 Winter February December
SELECT  mo.id, mo.season,
@r AS `LEAD(month) OVER (PARTITION BY season ORDER BY id)`,
(@r := month) AS month
FROM    (
SELECT  m.*
FROM    (
SELECT  @_season = NULL,
@s := NULL
) vars,
t_month m
ORDER BY
season, id DESC
) mo
WHERE  (CASE WHEN @_season IS NULL OR @_season <> season THEN @r := NULL ELSE NULL END IS NULL)
AND (@_season := season) IS NOT NULL
id season LEAD(month) OVER (PARTITION BY season ORDER BY id) month
11 Fall November
10 Fall November October
9 Fall October September
5 Spring May
4 Spring May April
3 Spring April March
8 Summer August
7 Summer August July
6 Summer July June
12 Winter December
2 Winter December February
1 Winter February January

In this query, we assign a new value of month to @r in the SELECT clause, and we need to change the order or columns a little so that the new value is assigned only after the old value is selected.

If it is not acceptable for some reason, we'll need to wrap this query into a subquery, but I won't do it in this example for the sake of brevity.

To be continued.

Written by Quassnoi

March 10th, 2009 at 11:00 pm

Posted in MySQL

5 Responses to 'Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG'

Subscribe to comments with RSS

  1. Hello,

    your SQL-knowledge is impressing.
    In the first table class=”terminal” the fifth th should not be “FIRST_VALUE(month) …”, but “LAST_VALUE(month) …”.

    Best whishes from Germany

    explainextended.qej@gishpuppy.com

    14 Mar 13 at 14:24

  2. hey,
    Thanks for your solution.
    And I found there is a bug of your SQL that implement LAG and LEAD in MySQL of 5.6 version: the lag value is not a null value in the first line of the non-first group.
    My solution is :
    SELECT mo.id,
    mo.season,
    mo.month,
    @s AS ·LAG(month) OVER(PARTITION BY season ORDER BY id)·,
    (case when @partition_by_column = season or @_season is null then @s := month else @s := null end) tmp_value,
    (@partition_by_column := season) partition_column
    FROM (SELECT m.*
    FROM (SELECT @partition_by_column = NULL, @s := NULL) vars, t_month m
    ORDER BY season, id) mo

    Jacky Zhang

    9 Jan 15 at 09:28

  3. SUPER SOLUTION

    NAVEEN

    30 May 15 at 11:55

  4. Hello,

    What does the @s accomplish in the lead and lag examples?

    Hilda

    3 May 17 at 23:37

  5. What is the lead function query of the same mentioned by Jacky

    Pankaj Kumar

    21 Jan 20 at 14:10

Leave a Reply