10

I have a Notification table contains about 100 million rows host in Amazon RDS with 1000 IOPS, and I want to delete those rows older than one month.

If I do DELETE FROM NOTIFICATION WHERE CreatedAt < DATE_SUB(CURDATE(), INTERVAL 30 day);, all the IOPS will be taken, the process will take hours, and a lot of new entries cannot be inserted due to "Lock wait timeout exceeded; try restarting transaction".

I was trying to do the way describe in here:http://mysql.rjweb.org/doc.php/deletebig However, I am using UUID instead of increment ID.

What is the correct and efficient way to delete those rows while not affect new data being insert/update?

Tianyi Cong
  • 133
  • 1
  • 2
  • 8
  • You are correct ypercube, I have rectify it. Thanks for pointing out! – Tianyi Cong Nov 20 '14 at 18:06
  • Deleting records in smaller chunks, does not impact the insert operation, I tried it with loop, and it finishes deleting 70 million record in less than an hour http://www.rathishkumar.in/2017/12/deleting-huge-number-of-records-in-mysql.html – Rathish Kumar B Dec 06 '17 at 08:13

4 Answers4

12

Make a temp table, switch it in and out, and copy the last 30 days data into it.

#
# Make empty temp table
#
CREATE TABLE NOTIFICATION_NEW LIKE NOTIFICATION;
#
# Switch in new empty temp table
#
RENAME TABLE NOTIFICATION TO NOTIFICATION_OLD,NOTIFICATION_NEW TO NOTIFICATION;
#
# Retrieve last 30 days data 
#
INSERT INTO NOTIFICATION SELECT * FROM NOTIFICATION_OLD
WHERE CreatedAt >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);

In your off hours, drop the old table

DROP TABLE NOTIFICATION_OLD;

Here are the Advantages to doing DELETEs like this

  1. NOTIFICATION is emptied fast by means switching in an empty table.
  2. NOTIFICATION is immediately available for new INSERTs
  3. The remaining 30 days are added back into NOTIFICATION while new INSERTs can take place.
  4. Dropping the old version of NOTIFICATION does not interfere with new INSERTs
  5. NOTE : I have recommended doing bait-and-switch for table DELETEs before : (See my July 19, 2012 post : Optimizing DELETE Query on MySQL MEMORY Table)

Give it a Try !!!

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
  • Thanks the reply Rolando! How does MySql internally handle drop table? First delete all the columns then remove table or something else? Will it take much shorter time than delete those cloumns? – Tianyi Cong Nov 19 '14 at 22:43
  • I am applying this strategy on staging, does rename command should start with "RENAME TABLE"? – Tianyi Cong Nov 20 '14 at 18:12
  • would there be any difference if i use it inside a transaction block, keeping in mind the lock wait timeout,also would it have any cons if my actual table is real huge which is now renamed and has to be dropped – Muhammad Omer Aslam May 20 '20 at 03:08
  • 1
    @MuhammadOmerAslam in that case, the ibdata1 file (system tablespace) would just grow in the undo logs. This is especially the case when ibdata1 would start getting bigger in filesize (See my old post https://dba.stackexchange.com/questions/40730/how-can-innodb-ibdata1-file-grows-by-5x-even-with-innodb-file-per-table-set/40734#40734). You need to exercise more care by using pt-archiver as mentioned in akuzminsky's answer, It could be throttled to delete rows in chunks and then use pt-online-schema-change to run ALTER TABLE ENGINE=InnoDB to shrink the table. – RolandoMySQLDBA May 20 '20 at 13:41
  • @RolandoMySQLDBA so you mean rather than following the above approach you have mentioned using temp table I should use the pt-archiver ? i havent used it before – Muhammad Omer Aslam May 20 '20 at 14:40
  • 1
    @MuhammadOmerAslam The above approach would be perfect when you have scheduled downtime. Following the link in akuzminsky's answer (https://www.percona.com/doc/percona-toolkit/LATEST/pt-archiver.html). You can use pt-archiver to archive data or just delete data without archiving. – RolandoMySQLDBA May 20 '20 at 16:37
  • thank you very much for your response @RolandoMySQLDBA – Muhammad Omer Aslam May 20 '20 at 16:44
4

My favorite is pt-archiver from Percona Toolkit. It takes care of MySQL load, replication lag.

akuzminsky
  • 4,967
  • 14
  • 16
  • Thanks the reply akuzminsky! I will look into it. I used to give Percona a try when I wanted to alter this notification table with pt-online-schema-change. However, it required SUPER privilege to make the change, which is not provided by RDS. BTW do you know any good way to alter huge table? – Tianyi Cong Nov 19 '14 at 22:47
  • @TianyiCong you have a new question: please ask it as a new question and perhaps comment here with a link, don't ask it in comments that's not how this site works. – Jack Douglas Nov 20 '14 at 18:30
0

I agree with the above answer to use pt-archiver.

But configuring options can be confusing! I played with a bunch of settings and found the following to work pretty well. You can make this into a bash script and run it, ideally in a screen session as it'll take a while to work.

#!/bin/bash

Deletes data in the given table.

This takes the strategy of deleting based on

some indexed key, possibly the primary key.

It will repeatedly delete the first 1000

records found up to the maximim $MAXVAL.

This seems to have OK performance when working

against a small Aurora instance.

It deletes around 100,000 records every minute.

You'll need to have Percona tools installed.

This was tested against percona-toolkit-3.3.1

Note the version of percona installed by apt

is old as of writing (11/22/22). I had to

download from Percona's website.

export HOST= export USER= export DB= export TABLE= export FIELD= export MAXVAL=

pt-archiver
--user $USER
--source h=$HOST,D=$DB,t=$TABLE
--purge
--where '$FIELD < $MAXVAL'
--bulk-delete
--progress 10000
--statistics
--commit-each
--limit 1000
--skip-foreign-key-checks
--no-check-charset
--ask-pass
--optimize $TABLE

https://gist.github.com/eastside/26eb4a093b0bb507115d63376e3a792f

-2

create table notification_temp as select * from notification where CreatedAt < DATE_SUB(CURDATE(), INTERVAL 30 day) ;

drop table notification;

RENAME notification_temp TO NOTIFICATION;