KNOWLEDGE BASE

Deleting Duplicated Rows in SQL Server Without Primary Key


When using Microsoft SQL Server, if your table does not include a primary key, it is possible to insert duplicated rows of data.  Removing these duplicated row(s) can then become a quite a problem ... with no easy way of selecting just one of these duplicates 

Consider the following example: first we create a simple database with a single table for membership that contains: a Member's ID, Membership Type, Firstname, Lastname and Birthday:

CREATE TABLE [dbo].[Membership](
[MemberID] [varchar](20) NOT NULL,
[MemberType] [int] NOT NULL,
[Firstname] [varchar](80) NOT NULL,
[Lastname] [varchar](80) NOT NULL,
[Birthday] [date] NOT NULL
) ON [PRIMARY]

If we then insert the following rows into this table:

INSERT INTO dbo.Membership (MemberID, Firstname, Lastname, MemberType, Birthday) VALUES
('001', 'Fred', 'Bloggs', 1, convert(datetime, '10/23/1995', 0))

INSERT INTO dbo.Membership (MemberID, Firstname, Lastname, MemberType, Birthday) VALUES
('002', 'Joe', 'Bloggs', 1, convert(datetime, '05/10/1973', 0))

INSERT INTO dbo.Membership (MemberID, Firstname, Lastname, MemberType, Birthday) VALUES
('002', 'Joe', 'Bloggs', 1, convert(datetime, '05/10/1973', 0))

INSERT INTO dbo.Membership (MemberID, Firstname, Lastname, MemberType, Birthday) VALUES
('003', 'Fannie', 'Mae', 1, convert(datetime, '02/14/1983', 0))

We now have duplicated Joe Blogg's entry (MemberID 2), with no obviously safe way of removing the duplicate, while leaving at least one entry intact. Now imagine if you had an entire database of data with hundreds or thousands of duplicates - fixing this by hand quickly becomes an impossible task!

To solve this, first we can add an column using the IDENTITY keyword so we now have a unique way of addressing each row:

ALTER TABLE dbo.Membership ADD AUTOID INT IDENTITY(1,1)

We can then we can use:

SELECT * FROM dbo.Membership WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.Membership GROUP BY MemberID)

Which will correctly select all duplicated records in our database (always worth checking before running the delete query!). Once we are happy this is working for us, we can then run the delete query:

DELETE FROM dbo.Membership WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.Membership GROUP BY MemberID)

 


Need an Umbraco Master?

Here at Simon Antony, we have an in house certified Umbraco Grand Master available for hire. Got a problem with your site, need architecture advice, give us a call to speak to Simon directly and see how we can help

Contact Simon Today!