Removing duplicated rows in a pandas dataframe without considering order

Asked Apr 12 '22 at 11:07

Active Apr 12 '22 at 11:13

Viewed 25 times

I'm in the situation of having a dataframe on the form:

import pandas as pd

df_1 = pd.DataFrame({
  'A': [0, 0, 1, 1, 1, 2],
  'B': [0, 1, 0, 1, 2, 1],
  'C': ['a', 'a', 'b', 'b', 'c',  'c']
})

what I want to do is to drop rows of that dataframe where the ordered couples coming from numbers of column 'A'and 'B' are duplicated.

So what I want is:

df_1 = pd.DataFrame({
  'A': [0, 0, 1, 1],
  'B': [0, 1, 1, 2],
  'C': ['a', 'a', 'b', 'c']
})

My idea was to add a column with a the sorted couple as a string and to use the drop_duplicates function of the dataframe, but since i'm using a very huge dataframe this solution is very expansive.

Did you have any suggestions? Thanks for the answers.

asked Apr 12 '22 at 11:07

alex sander

you can use `df_1[['A', 'B']].agg(sorted)` or similar to compute a sorted grouper – mozway Apr 12 '22 at 11:12

Removing duplicated rows in a pandas dataframe without considering order

0 Answers0