1

I have a certain pattern in my file as so:

....
BEGIN
any text1
any text2
END
....
BEGIN
any text3
garbage text
any text4
END
....
BEGIN
any text5
any text6
END
...

BEGIN and END are my markers, and I want to extract all the text between the markers only if the block does not contain 'garbage text'. So my expectation is to extract the blow blocks:

any text1
any text2

any text5
any text6

How do I do it in awk? I know I can do something like:

awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log

to extract the lines between the two markers, but how do I further refine the results by further filtering based on absence of 'garbage text'?

Ashwin Prabhu
  • 8,186
  • 5
  • 50
  • 76

1 Answers1

2
$ awk '/END/{if (rec !~ /garbage text/) print rec} {rec=rec $0 ORS} /BEGIN/{rec=""}' file
any text1
any text2

any text5
any text6

The above assumes every END is paired with a preceding BEGIN. WIth GNU awk for multi-char RS you could alternatively do:

$ awk -v RS='END\n' '{sub(/.*BEGIN\n/,"")} RT!="" && !/garbage text/' file
any text1
any text2

any text5
any text6

btw instead of:

awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log

your original code should be just:

awk '/END/{f=0} f; /BEGIN/{f=1}' file.log

See https://stackoverflow.com/a/17914105/1745001 for related idioms.

Community
  • 1
  • 1
Ed Morton
  • 172,331
  • 17
  • 70
  • 167