17

I have a list of values. How can I replace all values in a Dataframe column not in the given list of values?

For example,

>>> df = pd.DataFrame(['D','ND','D','garbage'], columns=['S'])
>>> df
      S
0    D
1    ND
2    D
3  garbage

>>> allowed_vals = ['D','ND']

I want to replace all values in the column S of the dataframe which are not in the list allowed_vals with 'None'. How can I do that?

banad
  • 411
  • 1
  • 5
  • 14

1 Answers1

23

You can use isin to check membership in allowed_list, ~ to negate that, and then .loc to modify the series in place:

>>> df.loc[~df["S"].isin(allowed_vals), "S"] = "None"
>>> df
      S
0     D
1    ND
2     D
3  None

because

>>> df["S"].isin(allowed_vals)
0     True
1     True
2     True
3    False
Name: S, dtype: bool

If you want to modify the entire frame (not just the column S), you can make a frame-sized mask:

>>> df
         S   T
0        D   D
1       ND   A
2        D  ND
3  garbage   A
>>> df[~df.isin(allowed_vals)] = "None"
>>> df
      S     T
0     D     D
1    ND  None
2     D    ND
3  None  None
DSM
  • 319,184
  • 61
  • 566
  • 472