Deleting duplicates
Comments enabled. I *really* need your comment
Microsoft Knowledge Base has an article KB139444 on how to delete duplicate rows from a table that has no primary key.
Though it works, it's quite an overkill to delete duplicate rows with grouping, temporary tables, deleting and reinserting.
With SQL Server 2005 and above, there is much more elegant solution.
Let's create a table and fill it with duplicate records:
CREATE TABLE t_duplicate (id INT NOT NULL, value VARCHAR(50) NOT NULL)
GO
SET NOCOUNT ON
BEGIN TRANSACTION
DECLARE @i INT
SET @i = 1
WHILE @i < 5000
BEGIN
INSERT
INTO t_duplicate
VALUES (@i / 1000 + 1, 'Value ' + CAST(@i AS VARCHAR))
SET @i = @i + 1
END
COMMIT
SELECT *
FROM t_duplicate
[/sourcecode]
<table class="terminal">
<tr><th>id</th><th>value</th></tr>
<tr><td>1</td><td>Value 1</td></tr>
<tr><td>1</td><td>Value 2</td></tr>
<tr><td>1</td><td>Value 3</td></tr>
<tr class="break"><td colspan="100"></td></tr>
<tr><td>5</td><td>Value 4997</td></tr>
<tr><td>5</td><td>Value 4998</td></tr>
<tr><td>5</td><td>Value 4999</td></tr>
</table>
And now let's delete the duplicates:
WITH q AS
(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
FROM t_duplicate d
)
DELETE
FROM q
WHERE rn > 1
SELECT *
FROM t_duplicate
| id | value |
|---|---|
| 1 | Value 1 |
| 2 | Value 1000 |
| 3 | Value 2000 |
| 4 | Value 3000 |
| 5 | Value 4000 |
Done in a single query.
Subscribe in a reader