remove rows from dataframe where contents could be a choice of strings

Question

so i can do something like:

data = df[ df['Proposal'] != 'C000' ]

to remove all Proposals with string C000, but how can i do something like:

data = df[ df['Proposal'] not in ['C000','C0001' ]

to remove all proposals that match either C000 or C0001 (etc. etc.)

Use `~` and `isin` `df.loc[~df.Proposal.isin(['C000', 'C0001'])]` — user3483203, Dec 07 '18 at 20:59

score 1 · Answer 1 · answered Dec 07 '18 at 22:02

1

You can try this,

df = df.drop(df[df['Proposal'].isin(['C000','C0001'])].index)

Or to select the required ones,

df = df[~df['Proposal'].isin(['C000','C0001'])]

answered Dec 07 '18 at 22:02

E. Zeytinci

2,502
1
15
35

score 0 · Answer 2 · answered Dec 07 '18 at 21:05

0

import numpy as np
data = df.loc[np.logical_not(df['Proposal'].isin({'C000','C0001'})), :]
# or
data = df.loc[              ~df['Proposal'].isin({'C000','C0001'}) , :]

answered Dec 07 '18 at 21:05

S.V

1,547
1
12
29

Can you explain how your answer works? – rassar Dec 07 '18 at 22:07
`isin` checks if values of the Series is in some set (aka 'in'), `np.logical_not` or `~` negate it (aka 'not in'), and `loc` selects rows of a DataFrame using boolean array. – S.V Dec 07 '18 at 22:15

remove rows from dataframe where contents could be a choice of strings

2 Answers2