3

Background: I have the following dataframe

import pandas as pd
d = {'text': ["paid", "paid and volunteer", "other phrase"]}
df = pd.DataFrame(data=d)
df['text'].apply(str) 

Output:

                   text
0                  paid
1    paid and volunteer
2          other phrase

Goal:

1) check each row to determine if paid is present and return a boolean (return True if paid is anywhere in the text column and False if paid is not present. But I would like to exclude the word volunteer. If volunteer is present, the result should be false.

2) create a new column with the results

Desired Output:

                   text     result
0                  paid     true
1    paid and volunteer     false
2          other phrase     false

Problem: I am using the following code

df['result'] = df['text'].astype(str).str.contains('paid') #but not volunteer

I checked How to negate specific word in regex? and it shows how to exclude a word but I am not sure how to include in my code

Question: How do I alter my code to achieve 1) and 2) of my goal

2 Answers2

0

Using lambda:

df['result'] = df['text'].apply(lambda row: True if ('paid' in row) and ('volunteer' not in row) else False)
niraj
  • 15,852
  • 4
  • 32
  • 47
0

You can use a logical and to check for both conditions.

(df.text.str.contains('paid')) & (~df.text.str.contains('volunteer'))
Out[14]: 
0     True
1    False
2    False
Name: text, dtype: bool
Allen Qin
  • 18,332
  • 6
  • 47
  • 59