From Stack Overflow:
I need to get 5 random records from a table plus a further record based on data from the users preferences.
How can I combine the two statements whilst ensuring that no results are repeated (i. e. the favourite is not present in the 5 random records)?
We should select the 6th record so that it would satisfy two conditions: first, it should be preferred by user, second, it should not be among the 5 random records selected earlier.
The problem with random records it that they are, um, random. We usually cannot reproduce 5 random records to apply further conditions on them.
That’s where the CTEs would come handy, but there’s a little problem.
A CTE, as Microsoft understands them, is just an alias for an online view.
Therefore, each time a CTE is used in a query, it may (or may not) be reevaluated.
It’s not a problem for a plain CTEs which are built out of static columns.
However, this is totally a problem when we try to
UNION two queries using an undeterministic CTE (like, ordered by
NEWID()), as we would do in this case.
Let’s create a sample table and see what happens:
CREATE SCHEMA [20090728_random] CREATE TABLE records ( id INT NOT NULL PRIMARY KEY, preferred INT NOT NULL, value VARCHAR(20) NOT NULL ) GO BEGIN TRANSACTION DECLARE @cnt INT SET @cnt = 1 WHILE @cnt <= 100000 BEGIN INSERT INTO [20090728_random].records VALUES ( @cnt, CASE @cnt % 1000 WHEN 0 THEN 1 ELSE 0 END, 'Record ' + CAST(@cnt AS VARCHAR) ) SET @cnt = @cnt + 1 END COMMIT
There are 100,000 rows in the table, with 100 of them marked as
Let’s see what’s happening when we try to select
TOP 5 * ORDER BY RANDOM() in a CTE and then
UNION ALL two CTEs:
WITH rows AS ( SELECT TOP 5 r.*, 1 AS rn FROM [20090728_random].records r ORDER BY NEWID() ) SELECT * FROM rows UNION ALL SELECT * FROM rows
|10 rows fetched in 0.0005s (0.1386s)|
We have two completely different sets. This is because the CTE was reevaluated in the second query.
This means we will not be able to use the CTE to check the 6th record. The CTE will be reevaluated, and the check will fail.
We need to build our query this way:
- Build a CTE called
randomsto select 5 random records
- Build a CTE called
prefsto select all preferred records
FULL OUTER JOINthese CTEs on
TOP 6results, ordering by
JOINthe table using
This will select each record exactly once from each of the CTEs, even it belongs to both of them.
randoms.id DESC makes sure that 5 random records will be selected first.
The 6th record will not be among the 5 random ones (or it would satisfy the FULL OUTER JOIN condition and would be selected earlier), and it’s guaranteed to have
preferred = 1 (there may be at most 5 records not satisfying this condition, and they already had been selected).
And here’s the query:
WITH randoms AS ( SELECT TOP 5 r.* FROM [20090728_random].records r ORDER BY NEWID() ), prefs AS ( SELECT id FROM [20090728_random].records WHERE preferred = 1 ) SELECT TOP 6 r.* FROM randoms FULL OUTER JOIN prefs ON randoms.id = prefs.id JOIN [20090728_random].records r ON r.id = COALESCE (randoms.id, prefs.id) ORDER BY randoms.id DESC
|6 rows fetched in 0.0004s (0.1766s)|
First 5 records are random, the 6th one is preferred.