Use CTE and Table Partition and Ranking Functions to Delete Rows with Duplicates

This post demonstrates to use T-SQL table partition, row_number ranking function, together with CTE to delete duplicated data from a table. The logic is saved in a stored procedure for reuse purpose.

Two Tables in the Example:

  • Orders table uses id as its primary key, and it also contains an accountNumber colume.
  • OrderLineItems table uses orderId as its foreign key. It also has a serialNumbercolumn which is supposed to be unique.
  • Problem:
    Due to code bug, the data in the OrderLineItems became dirty with duplicated serialNumber values. And an agreement has been made to delete all the duplicates and only leave one in the database.

    The SQL snippet is provided below. And it can also be found from gist.

    create procedure dbo.fixDuplication(@accountNumber varchar(255), @productId varchar(30))
    	WITH CTE as
    	select, oli.serialNumber, row_number() 
    	over (partoliion by serialNumber order by serialNumber) as rownumber
    	from orderLineItems as oli
    	join orders as o on
    	where o.accountnumber = @accountnumber 
    		and oli.productId=@productId
    		and oli.serialNumber is not null 
    	delete oli 
    	output deleted.*
    	from orderLineoliems as oli
    	join CTE as c on
    	where rownumber>1
    exec dbo.fixDuplication @accountNumber='12345', @productId='IPAD_1234';
    This entry was posted in Database and tagged , , , , , . Bookmark the permalink.

    Leave a Reply

    Fill in your details below or click an icon to log in: Logo

    You are commenting using your account. Log Out /  Change )

    Google photo

    You are commenting using your Google account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s