How to find duplicates in pandas?

Question

I've a data frame of about 52000 rows with some duplicates, when I use

df_drop_duplicates()

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

Does this answer your question? [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) — Abu Shoeb, Apr 27 '21 at 17:08

score 8 · Accepted Answer · edited Jun 20 '20 at 09:12

8

You could use duplicated for that:

df[df.duplicated()]

You could specify keep argument for what you want, from docs:

keep : {‘first’, ‘last’, False}, default ‘first’

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 15 '16 at 11:46

Anton Protopopov

27,206
10
83
90

How to find duplicates in pandas?

1 Answers1