1

I'd like to select all records from a table (names) where lastname is not unique. Preferably I would like to delete all records that are duplicates.

How would this be done? Assume that I don't want to rerun one query multiple times until it quits.

Brian Tompsett - 汤莱恩
  • 5,438
  • 68
  • 55
  • 126
Josh K
  • 27,374
  • 20
  • 83
  • 132

4 Answers4

3

To find which lastnames have duplicates:

  SELECT lastname, COUNT(lastname) AS rowcount 
    FROM table 
GROUP BY lastname 
  HAVING rowcount > 1

To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.

DELETE FROM table 
 WHERE id IN (SELECT id 
                FROM (SELECT * FROM table) AS t 
            GROUP BY lastname 
              HAVING COUNT(lastname) > 1)
OMG Ponies
  • 314,254
  • 77
  • 507
  • 490
Edward Dale
  • 28,799
  • 13
  • 87
  • 129
2

The fastest and easiest way to delete duplicate records is my issuing a very simple command.

ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])

This will lock the table, if this is an issue, try:

delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster

Gary
  • 2,806
  • 1
  • 16
  • 20
  • Locking the table isn't an issue. The issue is there already duplicate rows. – Josh K Jun 09 '10 at 14:38
  • If locking is not an issue, then executing ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME]) will rebuild the table and remove the duplicate records. – Gary Jun 09 '10 at 15:07
  • 1
    You can't apply a constraint if the data doesn't satisfy it - your suggestion would not work. – OMG Ponies Jun 09 '10 at 15:47
  • +1 and accepted. Locked the table temporarily and went to work. No duplicates and no more will be added. – Josh K Jun 09 '10 at 15:58
  • OMG, it does work. The IGNORE is the key part of what you are missing. – Gary Jun 09 '10 at 16:00
0

dup How can I remove duplicate rows?

DELETE names
FROM names
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, lastname 
   FROM names
   GROUP BY lastname 
) as KeepRows ON
   names.lastname = KeepRows.lastname 
WHERE
   KeepRows.RowId IS NULL

assumption: you have an RowId column

Community
  • 1
  • 1
Glennular
  • 17,117
  • 8
  • 55
  • 75
0
SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;

and then

DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1

but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?

oezi
  • 49,639
  • 10
  • 95
  • 114
  • Because duplicates are already in the table. I'm trying to add `lastname` as a `UNIQUE INDEX`. – Josh K Jun 09 '10 at 14:30