0

I have this string;

string = "STARTcandyFINISH  STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"

I would like to match and capture all substrings that appear in between START and FINISH but only if the word "poison" does NOT appear in that substring. How do I exclude this word and capture only the desired substrings?

re.findall(r'START(.*?)FINISH', string)

Desired captured groups:

candy
sugar
Sraw
  • 17,016
  • 6
  • 45
  • 76
etayluz
  • 14,907
  • 20
  • 95
  • 141
  • @Sraw I'm not sure if this is a duplicate because my question is about avoiding a word - not just a single character (please correct me if I'm wrong - thank you) – etayluz Jan 01 '20 at 08:57

1 Answers1

1

Using a tempered dot, we can try:

string = "STARTcandyFINISH  STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
matches = re.findall(r'START((?:(?!poison).)*?)FINISH', string)
print(matches)

This prints:

['candy', 'sugar']

For an explanation of how the regex pattern works, we can have a closer look at:

(?:(?!poison).)*?

This uses a tempered dot trick. It will match, one character at a time, so long as what follows is not poison.

Tim Biegeleisen
  • 451,927
  • 24
  • 239
  • 318