5

Below is my Dataframe:

X1  X2  X3  X4  X5
A   B   C   10  BAM
A   A   A   12  BAM
B   B   B   10  BAM
A   B   B   60  BAM

I want those rows having same values in columns(X1, X2,X3). Here we can see 2nd and 3rd rows are having same values for above 3 columns. My desired output is:

 X1 X2  X3  X4  X5
A   A   A   12  BAM
B   B   B   10  BAM

I tried like below:

yourdf1=df[df.nunique(0)==0]
print(yourdf1)

But here i am getting an error. Could anyone please help me.

ssp
  • 71
  • 1
  • 5
  • No it is not the duplicate.. There we are getting rows having same values across all the columns. But here i want only for particular few columns. – ssp May 18 '19 at 12:13
  • It doesn't matter. Selecting columns is a trivial step and not worth disputing closure over. – cs95 May 19 '19 at 05:52

3 Answers3

11

Select columns in list for test number of unique values per rows by axis=1 in DataFrame.nunique and test 1 for filter by boolean indexing:

yourdf1 = df[df[['X1','X2','X3']].nunique(axis=1) == 1]
print(yourdf1)
  X1 X2 X3  X4   X5
1  A  A  A  12  BAM
2  B  B  B  10  BAM

Another solution is use DataFrame.eq with filtered DataFrame, compare by first column and get all Trues per rows by DataFrame.all:

df1 = df[['X1','X2','X3']]
yourdf1 = df[df1.eq(df1.iloc[:, 0], axis=0).all(axis=1)]
print(yourdf1)

  X1 X2 X3  X4   X5
1  A  A  A  12  BAM
2  B  B  B  10  BAM
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
0

Try

yourdf = df[~df.duplicated(subset=['X1','X2','X3'])]
Zoe stands with Ukraine
  • 25,310
  • 18
  • 114
  • 149
Quang Hoang
  • 131,600
  • 10
  • 43
  • 63
0

Please see attached

df[df[['X1','X2','X3']].duplicated(keep=False)]