I've a data frame of about 52000 rows with some duplicates, when I use
df_drop_duplicates()
I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows
I've a data frame of about 52000 rows with some duplicates, when I use
df_drop_duplicates()
I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows
You could use duplicated for that:
df[df.duplicated()]
You could specify keep argument for what you want, from docs:
keep : {‘first’, ‘last’, False}, default ‘first’
first: Mark duplicates asTrueexcept for the first occurrence.last: Mark duplicates asTrueexcept for the last occurrence.False: Mark all duplicates asTrue.