This post demonstrates to use T-SQL table partition, row_number ranking function, together with CTE to delete duplicated data from a table. The logic is saved in a stored procedure for reuse purpose.
Two Tables in the Example:
Due to code bug, the data in the OrderLineItems became dirty with duplicated serialNumber values. And an agreement has been made to delete all the duplicates and only leave one in the database.
The SQL snippet is provided below. And it can also be found from gist.
create procedure dbo.fixDuplication(@accountNumber varchar(255), @productId varchar(30)) as begin WITH CTE as ( select oli.id, oli.serialNumber, row_number() over (partoliion by serialNumber order by serialNumber) as rownumber from orderLineItems as oli join orders as o on o.id=oli.orderId where o.accountnumber = @accountnumber and oli.productId=@productId and oli.serialNumber is not null ) delete oli output deleted.* from orderLineoliems as oli join CTE as c on c.id=oli.id where rownumber>1 end go exec dbo.fixDuplication @accountNumber='12345', @productId='IPAD_1234';