9

I made a mistake and I have unwanted duplicates.

I have a table with 4 key fields. A1, k1, k2, k3.

A1 is auto increment and the primary key.

the combination of k1, k2 and k3 is supposed to be unique and I have to delete the duplicate rows before I create a unique index. Some rows have one duplicate, some have many.

SELECT CONCAT(k1, k2, k) AS dup_value
  FROM myviews
 GROUP BY dup_value
HAVING (COUNT(dup_value) > 1)

shows me duplicates values that I need to deal with. But now I don't know how to keep one and delete the rest of each duplicate set.

Brian Tompsett - 汤莱恩
  • 5,438
  • 68
  • 55
  • 126
sdfor
  • 6,064
  • 12
  • 49
  • 61
  • 2
    There are a LOT of questions regarding this problem. This should be closed as duplicate - FAST. Do a search for "delete duplicate rows." Here's [the first result](http://stackoverflow.com/questions/3777633/delete-duplicate-rows-dont-delete-all-duplicate) for me. – rockerest May 23 '11 at 21:30
  • 1
    @rockerest Interestingly I tried the solution you linked to first and it doesn't work with MySQL. The answer from OMG Ponies did work for me on MySQL so that's where my +1 goes. – jsleuth Aug 13 '13 at 18:36

4 Answers4

15

Backup your data, then...

MySQL supports JOINs in DELETE statements. If you want to keep the first of the duplicates:

DELETE a
  FROM MYVIEWS a
  JOIN (SELECT MIN(t.a1) AS min_a1, t.k1, t.k2, t.k3
          FROM MYVIEWS t
      GROUP BY t.k1, t.k2, t.k3
        HAVING COUNT(*) > 1) b ON b.k1 = a.k1
                              AND b.k2 = a.k2
                              AND b.k3 = a.k3
                              AND b.min_a1 != a.a1

If you want to keep the last of the duplicates:

DELETE a
  FROM MYVIEWS a
  JOIN (SELECT MAX(t.a1) AS max_a1, t.k1, t.k2, t.k3
          FROM MYVIEWS t
      GROUP BY t.k1, t.k2, t.k3
        HAVING COUNT(*) > 1) b ON b.k1 = a.k1
                              AND b.k2 = a.k2
                              AND b.k3 = a.k3
                              AND b.max_a1 != a.a1
Community
  • 1
  • 1
OMG Ponies
  • 314,254
  • 77
  • 507
  • 490
2

You can create a new table with the same structure but empty, then create the unique key on it, then do a INSERT IGNORE / SELECT * FROM the original table into the new table, then delete the original table.

INSERT IGNORE will automatically ignore any primary or unique key issues and just skip the duplicates.

spanky
  • 1,349
  • 1
  • 9
  • 22
  • Not an ideal practice for Production systems -- queries will be depending on the old table, and there can't be more than one table in the database with the same name. This would require an outage. – OMG Ponies May 23 '11 at 21:45
  • Very true. I assumed OP's wasn't in production yet, due to altering keys and deleting tons of rows, but I shouldn't assume! – spanky May 23 '11 at 21:59
2

Someting like this?

DELETE FROM myviews WHERE EXISTS(SELECT CONCAT(k1, k2, k) AS dup_value
FROM myviews
GROUP BY dup_value
HAVING (COUNT(dup_value) > 1));
Automatico
  • 11,522
  • 8
  • 74
  • 107
  • Or maybe I misunderstood your question. Is it duplicity within one tuple? – Automatico May 23 '11 at 21:35
  • this is wrong (at least relative to the question) becouse it deletes all rows on duplicated sets – Luis Siquot Aug 26 '11 at 13:39
  • In addition to deleting all rows instead of all but one (as @LuisSiquot mentions) this will cause an error when run on tables: SQL Error (1093): You can't specify target table 'mytable' for update in FROM clause – ebyrob Nov 10 '16 at 16:52
0

You need a separator in your concat function, because otherwise "a", "b", and "cd" is the same as "abcd", "", "".

Nick ODell
  • 9,210
  • 2
  • 26
  • 54