Member of the LearnItFirst.com Video Training Network | LearnSqlServer.com | SQL SSIS Training | SQL Programming Tutorials |
LearnSqlServer.com Forums LearnSqlServer.com
Welcome Guest Search | New Posts | Members | Log In | Register

How to remove partially duplicate rows from select query result set. Options
aparnagarg
Posted: Saturday, February 02, 2008 10:44:14 AM
Rank: Newbie

Joined: 1/28/2008
Posts: 2
Points: -94
Location: India
How to remove partially duplicate rows from select query result set.
dibyant
Posted: Sunday, February 03, 2008 11:40:14 PM
Rank: Newbie

Joined: 1/21/2008
Posts: 9
Points: -73
Location: usa
-- APARNA, This might help you--
Microsoft SQL Server tables should never contain duplicate rows, nor non-unique primary keys. For brevity, we will sometimes refer to primary keys as "key" or "PK" in this article, but this will always denote "primary key." Duplicate PKs are a violation of entity integrity, and should be disallowed in a relational system. SQL Server has various mechanisms for enforcing entity integrity, including indexes, UNIQUE constraints, PRIMARY KEY constraints, and triggers.

Despite this, under unusual circumstances duplicate primary keys may occur, and if so they must be eliminated. One way they can occur is if duplicate PKs exist in non-relational data outside SQL Server, and the data is imported while PK uniqueness is not being enforced. Another way they can occur is through a database design error, such as not enforcing entity integrity on each table.

Often duplicate PKs are noticed when you attempt to create a unique index, which will abort if duplicate keys are found. This message is:
Msg 1505, Level 16, State 1 Create unique index aborted on duplicate key.
If you are using SQL Server 2000 or SQL Server 2005, you may receive the following error message:
Msg 1505, Level 16, State 1 CREATE UNIQUE INDEX terminated because a duplicate key was found for object name '%.*ls' and index name '%.*ls'. The duplicate key value is %ls.
This article discusses how to locate and remove duplicate primary keys from a table. However, you should closely examine the process which allowed the duplicates to happen in order to prevent a recurrence.

MORE INFORMATION
For this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.

create table t1(col1 int, col2 int, col3 char(50))
insert into t1 values (1, 1, 'data value one')
insert into t1 values (1, 1, 'data value one')
insert into t1 values (1, 2, 'data value two')


The first step is to identify which rows have duplicate primary key values:

SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1


This will return one row for each set of duplicate PK values in the table. The last column in this result is the number of duplicates for the particular PK value.

col1 col2
1 1 2


If there are only a few sets of duplicate PK values, the best procedure is to delete these manually on an individual basis. For example:

set rowcount 1
delete from t1
where col1=1 and col2=1


The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.

Before deleting the rows, you should verify that the entire row is duplicate. While unlikely, it is possible that the PK values are duplicate, yet the row as a whole is not. An example of this would be a table with Social Security Number as the primary key, and having two different people (or rows) with the same number, each having unique attributes. In such a case whatever malfunction caused the duplicate key may have also caused valid unique data to be placed in the row. This data should copied out and preserved for study and possible reconciliation prior to deleting the data.

If there are many distinct sets of duplicate PK values in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:
1. First, run the above GROUP BY query to determine how many sets of duplicate PK values exist, and the count of duplicates for each set.
2. Select the duplicate key values into a holding table. For example:

SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1


3. Select the duplicate rows into a holding table, eliminating duplicates in the process. For example:
- --SELECT DISTINCT * from.....( also good command to find the unique values)

SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2


4. At this point, the holddups table should have unique PKs, however, this will not be the case if t1 had duplicate PKs, yet unique rows (as in the SSN example above). Verify that each key in holddups is unique, and that you do not have duplicate keys, yet unique rows. If so, you must stop here and reconcile which of the rows you wish to keep for a given duplicate key value. For example, the query:

SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2


should return a count of 1 for each row. If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table.
5. Delete the duplicate rows from the original table. For example:

DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2


6. Put the unique rows back in the original table. For example:

INSERT t1 SELECT * FROM holddups


--
DIBYANT UPADHYAY
Shilpa
Posted: Thursday, May 08, 2008 2:36:43 AM
Rank: Newbie

Joined: 5/8/2008
Posts: 1
Points: 3
You can simply write
SELECT distinct * FROM tablename

ShilpaReddy
dibyant
Posted: Thursday, May 08, 2008 10:49:45 AM
Rank: Newbie

Joined: 1/21/2008
Posts: 9
Points: -73
Location: usa
it will just give you all the distinct values as a result set , will not remove duplicates from the source.

Thanks
Stixoffire
Posted: Wednesday, May 14, 2008 1:55:33 PM
Rank: Newbie

Joined: 5/14/2008
Posts: 1
Points: 3
Location: o
Removing of Duplicates - I use this code in a production system b/c when we import from one database engine - it does not always give us what we want - here is some code for you to DeDupe:

Delete test_outer
from myDuplicateRowsTable as test_outer
Where exists (
select *
from myDuplicateRowsTable
Where test_Inner.myField = test_outer.myField
group by myField
having count(*) >1 AND MIN(test_Inner.IDentityID) <> test_outer.IDentityID
);

HOPE THAT HELPS YOU !! Big Grin Big Grin Cool Smile
Scott Whigham
Posted: Wednesday, May 14, 2008 6:00:24 PM


Rank: Super Mod

Joined: 3/20/2006
Posts: 345
Points: 748
Location: Dallas, TX
Good on ya - one suggestion I would add: change the DELETE to a SELECT so you can test it first Smile

Just to make sure, right?

And of course try it in a copy of the live db first Smile
Users browsing this topic
Guest


Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.
     
Don't Forget!
LearnItFirst.com
Don't Forget!
LearnSqlServe.com
 
Home | About Us | Support | Contact Us | Privacy | Site Map | Blogs Blogs Refer a Friend and Get a Free Subscription!
© Copyright 2004-2007 LearnItFirst.com LLC. All rights reserved. All trademarks remain the property of their respective owners.
This site is not affiliated in any way with the Microsoft Corporation.