Sometimes we need to get a sample row from a table satisfying a certain condition. Like, get a first row for each month.
MS SQL and
Oracle supply analytical function
ROW_NUMBER() for this purpose.
Let's create a simple table to illustrate our needs and see how do we query it.
CREATE TABLE t_limiter ( id int(10) unsigned NOT NULL, grouper int(10) unsigned NOT NULL, value varchar(45) NOT NULL, PRIMARY KEY (id), KEY ix_limiter_grouper_id (grouper,id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SELECT * FROM ( SELECT l.*, ROW_NUMBER() OVER (PARTITION BY grouper ORDER BY id DESC) AS rn FROM t_limiter l ) lo WHERE rn = 1
MySQL of course lacks this function, but it can be easily emulated using:
SELECT * FROM ( SELECT grouper, MIN(id) AS mid FROM t_limiter GROUP BY grouper ) dl, t_limiter l WHERE l.grouper = dl.grouper AND l.id = mid
That's fine, we get exactly one row for each
grouper. But what is we need 15 rows for each month?
With analytical functions, it's still simple:
SELECT * FROM ( SELECT l.*, ROW_NUMBER() OVER (PARTITION BY grouper ORDER BY id DESC) AS rn FROM t_limiter l ) lo WHERE rn <= 15
MySQL we can do, umm…
Grouping won't help us in this case, as it always returns 1 row for each
grouper, and we need 15 rows for a join. Same with correlated subqueries: they return 1 row.
In this case, we can emulate analytical functions using tricks with session variables.
Let's populate the table and see how the query works.
We will create 999,999 records with
grouper = id % 32, which will give us
grouper's from 0 to 31, and 1 more record with
grouper = 32. This final record will have
id of 1,000,000.
DELIMITER $$ CREATE PROCEDURE prc_fill_limiter(_cnt INT, _modulus INT) BEGIN DECLARE i INT; SET i = 1; WHILE i < _cnt DO INSERT INTO t_limiter (id, grouper, value) VALUES (i, i % _modulus, CONCAT('Value ', i)); SET i = i + 1; END WHILE; INSERT INTO t_limiter (id, grouper, value) VALUES (i, _modulus, CONCAT('Value ', i)); END; $$ DELIMITER ; START TRANSACTION; CALL prc_fill_limiter (1000000, 32); COMMIT;
First, we need to select everything from the table, ordered first by
grouper then by
id (like analytics functions do). Session variables expressions are then being evaluated in the
WHERE clause of the query.
The internal counter
@r, being initially set to the number or rows we want to select for each
grouper, is decremented on each row, and the when it falls under zero, the rows start being filtered out.
grouper value changes (thus indicating that the new
PARTITION started), the value of
@r is reset back to limit.
SELECT l.* FROM ( SELECT @lim := 15, @cg := -1 ) vars, t_limiter l WHERE CASE WHEN @cg <> grouper THEN @r := @lim ELSE 1 END > 0 AND (@r := @r - 1) >= 0 AND (@cg := grouper) IS NOT NULL ORDER BY grouper, id
Being run, the query returns the following:
|481 rows fetched in 0,0044s (3,4241s)|
As you can see, the query above works and completes in
Θ(n) time. When you have lots of groups and just a few items in each group, that's exactly what you need.
To be continued.