4

How to extract a substring after keyword am, is or are from a string but not include am, is or are?

string = 'I am John'

I used:

re.findall('(?<=(am|is|are)).*', string)

An error occurs

re.error: look-behind requires fixed-width pattern

What is the correct approach?

Wai Ha Lee
  • 8,173
  • 68
  • 59
  • 86
Chan
  • 2,855
  • 6
  • 27
  • 50

3 Answers3

6
import re

s = 'I am John'

g = re.findall(r'(?:am|is|are)\s+(.*)', s)
print(g)

Prints:

['John']
Andrej Kesely
  • 118,151
  • 13
  • 38
  • 75
2

In cases like this I like to use finditer because the match objects it returns are easier to manipulate than the strings returned by findall. You can continue to match am/is/are, but also match the rest of the string with a second subgroup, and then extract only that group from the results.

>>> import re
>>> string = 'I am John'
>>> [m.group(2) for m in re.finditer("(am|is|are)(.*)", string)]
[' John']

Based on the structure of your pattern, I'm guessing you only want at most one match out of the string. Consider using re.search instead of either findall or finditer.

>>> re.search("(am|is|are)(.*)", string).group(2)
' John'

If you're thinking "actually I want to match every instance of a word following am/is/are, not just the first one", that's a problem, because your .* component will match the entire rest of the string after the first am/is/are. E.g. for the string "I am John and he is Steve", it will match ' John and he is Steve'. If you want John and Steve separately, perhaps you could limit the character class that you want to match. \w seems sensible:

>>> string = "I am John and he is Steve"
>>> [m.group(2) for m in re.finditer(r"(am|is|are) (\w*)", string)]
['John', 'Steve']
Kevin
  • 72,202
  • 12
  • 116
  • 152
0

One of the solution is using partition function. there is an example

string = 'I am John'
words = ['am','is','are']

for word in words :
    before,word,after = string.partition(word)
    print (after)

OUTPUT :

 John
Omer Tekbiyik
  • 3,049
  • 1
  • 9
  • 21