Extract substring with regular expression in Python

Question

How to extract a substring after keyword am, is or are from a string but not include am, is or are?

string = 'I am John'

I used:

re.findall('(?<=(am|is|are)).*', string)

An error occurs

re.error: look-behind requires fixed-width pattern

What is the correct approach?

score 6 · Accepted Answer · answered Jun 11 '19 at 12:17

6

import re

s = 'I am John'

g = re.findall(r'(?:am|is|are)\s+(.*)', s)
print(g)

Prints:

['John']

answered Jun 11 '19 at 12:17

Andrej Kesely

118,151
13
38
75

Kevin · Answer 2 · 2019-06-11T12:24:25.617

In cases like this I like to use finditer because the match objects it returns are easier to manipulate than the strings returned by findall. You can continue to match am/is/are, but also match the rest of the string with a second subgroup, and then extract only that group from the results.

>>> import re
>>> string = 'I am John'
>>> [m.group(2) for m in re.finditer("(am|is|are)(.*)", string)]
[' John']

Based on the structure of your pattern, I'm guessing you only want at most one match out of the string. Consider using re.search instead of either findall or finditer.

>>> re.search("(am|is|are)(.*)", string).group(2)
' John'

If you're thinking "actually I want to match every instance of a word following am/is/are, not just the first one", that's a problem, because your .* component will match the entire rest of the string after the first am/is/are. E.g. for the string "I am John and he is Steve", it will match ' John and he is Steve'. If you want John and Steve separately, perhaps you could limit the character class that you want to match. \w seems sensible:

>>> string = "I am John and he is Steve"
>>> [m.group(2) for m in re.finditer(r"(am|is|are) (\w*)", string)]
['John', 'Steve']

score 0 · Answer 3 · answered Jun 11 '19 at 12:25

0

One of the solution is using partition function. there is an example

string = 'I am John'
words = ['am','is','are']

for word in words :
    before,word,after = string.partition(word)
    print (after)

OUTPUT :

 John

answered Jun 11 '19 at 12:25

Omer Tekbiyik

3,049
1
9
21

Extract substring with regular expression in Python

3 Answers3