0

I am trying to remove duplicate values in pandas and replace those values with an empty value.

Originally, I had those A values on Header A (Column A) and I would like remove those A values and replace A with an empty string ""

Header A Header B
A B
A C
A D
A E
A F

To this:

Header A Header B
A B
C
D
E
F

How do I do this in Pandas using Python? Those values are from csv file.

2 Answers2

1

Use:

df.loc[df['Header A'].duplicated(), 'Header A'] = ''
print (df)
  Header A Header B
0        A        B
1                 C
2                 D
3                 E
4                 F
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
0

Replace with NaN:

df.loc[df['Header A'].duplicated(), 'Header A'] = np.NaN

Replace with empty string:

df.loc[df['Header A'].duplicated(), 'Header A'] = "" 

if you want it another columns as well:

df.loc[(df['Header A'].duplicated() & df['Header B'].duplicated()), ['Header A','Header B']] = ''
dare_devils
  • 1,575
  • 1
  • 13
  • 15