Archive for March 14th, 2009
Deleting duplicates
Comments enabled. I *really* need your comment
Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.
Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.
With SQL Server 2005 and above, there is much more elegant solution.
Let's create a table and fill it with duplicate records:
CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL) GO SET NOCOUNT ON BEGIN TRANSACTION DECLARE @i INT SET @i = 1 WHILE @i < 5000 BEGIN INSERT INTO t_duplicate VALUES (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR)) SET @i = @i + 1 END COMMIT SELECT * FROM t_duplicate [/sourcecode] <table class="terminal"> <tr><th>id</th><th>value</th></tr> <tr><td>1</td><td>Value 1</td></tr> <tr><td>1</td><td>Value 2</td></tr> <tr><td>1</td><td>Value 3</td></tr> <tr class="break"><td colspan="100"></td></tr> <tr><td>5</td><td>Value 4997</td></tr> <tr><td>5</td><td>Value 4998</td></tr> <tr><td>5</td><td>Value 4999</td></tr> </table> And now let's delete the duplicates: WITH q AS ( SELECT d.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn FROM t_duplicate d ) DELETE FROM q WHERE rn > 1 SELECT * FROM t_duplicate
id | value |
---|---|
1 | Value 1 |
2 | Value 1000 |
3 | Value 2000 |
4 | Value 3000 |
5 | Value 4000 |
Done in a single query.