0

I have a set of lines where most of them follow this format

STARTKEYWORD some text I want to extract ENDKEYWORD\n

I want to find these lines and extract information from them.

Note, that the text between keywords can contain a wide range of characters (latin and non-latin letters, numbers, spaces, special characters) except \n.

ENDKEYWORD is optional and sometimes can be omitted.

My attempts are revolving around this regex

STARTKEYWORD (.+)(?:\n| ENDKEYWORD)

However capturing group (.+) consumes as many characters as possible and takes ENDKEYWORD which I do not need.

Is there a way to get some text I want to extract solely with regular expressions?

Alik
  • 22,355
  • 5
  • 46
  • 63

2 Answers2

1

You can make (.+) non greedy (which is by default greedy and eats whatever comes in its way) by adding ? and add $ instead of \n for making more efficient

STARTKEYWORD (.+?)(?:$| ENDKEYWORD$)

If you specifically want \n you can use:

STARTKEYWORD (.+?)(?:\n| ENDKEYWORD\n)

See DEMO

karthik manchala
  • 13,256
  • 1
  • 28
  • 55
1

You could use a lookahead based regex. It always better to use $ end of the line anchor since the last line won't contain a newline character at the last.

STARTKEYWORD (.+?)(?= ENDKEYWORD|$)

OR

STARTKEYWORD (.+?)(?: ENDKEYWORD|$)

DEMO

Avinash Raj
  • 166,785
  • 24
  • 204
  • 249
  • 1
    Gives false positives for `STARTKEYWORD some text I want to extract ENDKEYWORDlalala – karthik manchala Apr 25 '15 at 16:54
  • Where? it works https://regex101.com/r/rI5nF1/6 . OP wants to extract text upto `ENDKEYWORD` only if there contains ENDKEYWORD string else he wants to capture the whole line. – Avinash Raj Apr 25 '15 at 16:56
  • "false positives"... it matched for other than just ENDKEYWORD... :) – karthik manchala Apr 25 '15 at 16:58
  • @AvinashRaj while it is not possible to get such string in my current setup but it would be great to process them as well. – Alik Apr 25 '15 at 16:59
  • @AvinashRaj I mean It would be great if regex matches all text betwen `STARTKEYWORD` and `ENDKEYWORD\n` or `\n` – Alik Apr 25 '15 at 17:00
  • @AvinashRaj sorry if it was unclear from the beginning. I upvoted your answer. Thanks for helping – Alik Apr 25 '15 at 17:03