2

I have a similar problem to this question similar question. However, I need to replace values in the same column given different conditions. Something like the code below

for item in items:
    df.loc[df['A'] == item,'A'] = 'other'

where items is a list with different strings that I need to replace with 'other' in column 'A'. The thing is that my dataframe is very large and this approach is very slow. Is there a faster way to do it?

jpp
  • 147,904
  • 31
  • 244
  • 302
user1571823
  • 344
  • 4
  • 16

1 Answers1

1

Use pd.Series.isin to index by a single Boolean series:

df.loc[df['A'].isin(items), 'A'] = 'other'

The bottleneck in your logic is df['A'] == item in a loop. The above method ensures only a single Boolean series is calculated for indexing.

jpp
  • 147,904
  • 31
  • 244
  • 302