SQL Server: random records avoiding CTE reevaluation
Comments enabled. I *really* need your comment
From Stack Overflow:
I need to get 5 random records from a table plus a further record based on data from the users preferences.
How can I combine the two statements whilst ensuring that no results are repeated (i. e. the favourite is not present in the 5 random records)?
We should select the 6th record so that it would satisfy two conditions: first, it should be preferred by user, second, it should not be among the 5 random records selected earlier.
The problem with random records it that they are, um, random. We usually cannot reproduce 5 random records to apply further conditions on them.
That's where the CTEs would come handy, but there's a little problem.
A CTE, as Microsoft understands them, is just an alias for an online view.
Therefore, each time a CTE is used in a query, it may (or may not) be reevaluated.
It's not a problem for a plain CTEs which are built out of static columns.
However, this is totally a problem when we try to UNION
two queries using an undeterministic CTE (like, ordered by NEWID()
), as we would do in this case.
Let's create a sample table and see what happens:
CREATE SCHEMA [20090728_random] CREATE TABLE records ( id INT NOT NULL PRIMARY KEY, preferred INT NOT NULL, value VARCHAR(20) NOT NULL ) GO BEGIN TRANSACTION DECLARE @cnt INT SET @cnt = 1 WHILE @cnt <= 100000 BEGIN INSERT INTO [20090728_random].records VALUES ( @cnt, CASE @cnt % 1000 WHEN 0 THEN 1 ELSE 0 END, 'Record ' + CAST(@cnt AS VARCHAR) ) SET @cnt = @cnt + 1 END COMMIT [/sourcecode] There are <strong>100,000</strong> rows in the table, with <strong>100</strong> of them marked as <code>preferred</code>. Let's see what's happening when we try to select <code>TOP 5 * ORDER BY RANDOM()</code> in a <strong>CTE</strong> and then <code>UNION ALL</code> two <strong>CTE</strong>s: WITH rows AS ( SELECT TOP 5 r.*, 1 AS rn FROM [20090728_random].records r ORDER BY NEWID() ) SELECT * FROM rows UNION ALL SELECT * FROM rows
id | preferred | value | rn | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13696 | 0 | Record 13696 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
81375 | 0 | Record 81375 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
97171 | 0 | Record 97171 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15569 | 0 | Record 15569 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
65057 | 0 | Record 65057 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
93492 | 0 | Record 93492 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
63785 | 0 | Record 63785 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
36144 | 0 | Record 36144 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
52825 | 0 | Record 52825 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
53930 | 0 | Record 53930 | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0005s (0.1386s) |
We have two completely different sets. This is because the CTE was reevaluated in the second query.
This means we will not be able to use the CTE to check the 6th record. The CTE will be reevaluated, and the check will fail.
We need to build our query this way:
- Build a CTE called
randoms
to select 5 random records - Build a CTE called
prefs
to select all preferred records FULL OUTER JOIN
these CTEs onid
- Select
TOP 6
results, ordering byrandoms.id DESC
JOIN
the table usingCOALESCE(randoms.id, prefs.id)
This will select each record exactly once from each of the CTEs, even it belongs to both of them.
Ordering by randoms.id DESC
makes sure that 5 random records will be selected first.
The 6th record will not be among the 5 random ones (or it would satisfy the FULL OUTER JOIN condition and would be selected earlier), and it's guaranteed to have preferred = 1
(there may be at most 5 records not satisfying this condition, and they already had been selected).
And here's the query:
WITH randoms AS ( SELECT TOP 5 r.* FROM [20090728_random].records r ORDER BY NEWID() ), prefs AS ( SELECT id FROM [20090728_random].records WHERE preferred = 1 ) SELECT TOP 6 r.* FROM randoms FULL OUTER JOIN prefs ON randoms.id = prefs.id JOIN [20090728_random].records r ON r.id = COALESCE (randoms.id, prefs.id) ORDER BY randoms.id DESC
id | preferred | value | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
83022 | 0 | Record 83022 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
79921 | 0 | Record 79921 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
78241 | 0 | Record 78241 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
58192 | 0 | Record 58192 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18954 | 0 | Record 18954 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1000 | 1 | Record 1000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 rows fetched in 0.0004s (0.1766s) |
First 5 records are random, the 6th one is preferred.