2

I've a data frame of about 52000 rows with some duplicates, when I use

df_drop_duplicates() 

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

  • Does this answer your question? [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) – Abu Shoeb Apr 27 '21 at 17:08

1 Answers1

8

You could use duplicated for that:

df[df.duplicated()]

You could specify keep argument for what you want, from docs:

keep : {‘first’, ‘last’, False}, default ‘first’

  • first : Mark duplicates as True except for the first occurrence.
  • last : Mark duplicates as True except for the last occurrence.
  • False : Mark all duplicates as True.
Community
  • 1
  • 1
Anton Protopopov
  • 27,206
  • 10
  • 83
  • 90