I have a table currently with 100M+ rows of telemetry data. The table often receives duplicate records.
I can identify duplicate records using a row_number() partition and estimate there are approximately 12M duplicate rows.
What is the option to delete the 12M rows?
- Insert non-duplicates into a new table and drop existing table, recreate indexes?
- Using a join to delete the duplicates?
- Delete in batches? What size?
- Do I drop indexes first?
There are no production requirements to keep the db online.
Thanks for your help.