split file into multiple files based upon differing start and end delimiter

Question

i have a file that i need split into multiple files, and need it done via separate start and end delimiters.

for example, if i have the following file:

abcdef
START
ghijklm
nopqrst
END
uvwxyz
START
abcdef
ghijklm
nopqrs
END
START
tuvwxyz
END

i need 3 separate files of:

file1

START
ghijklm
nopqrst
END

file2

START
abcdef
ghijklm
nopqrs
END

file3

START
tuvwxyz
END

i found this link which showed how to do it with a starting delimiter, but i also need an ending delimiter. i have tried this using some regex in the awk command, but am not getting the result that i want. i don't quite understand how to get awk to be 'lazy' or 'non greedy', so that i can get it to pull apart the file correctly.

i really like the awk solution. something similar would be fantastic (i am reposting the solution here so you don't have to click through:

awk '/DELIMITER_HERE/{n++}{print >"out" n ".txt" }' input_file.txt

any help is appreciated.

score 4 · Accepted Answer · answered Jan 27 '14 at 17:25

4

You can use this awk command:

awk '/^START/{n++;w=1} n&&w{print >"out" n ".txt"} /^END/{w=0}' input_file.txt

answered Jan 27 '14 at 17:25

anubhava

713,503
59
514
593

i like this solution the best. it worked perfectly for me, and i only had to type in the delimiters one time for the command (the delimiters are much, much longer than my START and END delimiters used in the example). Thank you. – jasonmclose Jan 27 '14 at 18:39
1

using n&&w is an amazing trick and it did all the job. Nice One. – Obsidian Aug 28 '18 at 20:08

score 4 · Answer 2 · answered Jan 27 '14 at 17:25

4

awk '
    /START/ {p = 1; n++; file = "file" n}
    p { print > file }
    /END/ {p = 0}
' filename

answered Jan 27 '14 at 17:25

glenn jackman

223,850
36
205
328

n0741337 · Answer 3 · 2014-01-27T19:43:06.613

1

Here's another example using range notation:

awk '/START/,/END/ {if(/START/) n++; print > "out" n ".txt"}' data

Or an equivalent with a different if/else syntax:

awk '/START/,/END/ {print > "out" (/START/ ? ++n : n) ".txt"}' data

Here's a version without repeating the /START/ regex after Ed Morton's comments because I just wanted to see if it would work:

awk '/START/ && ++n,/END/ {print > "out" n ".txt" }' data

The other answers are definitely better if your range is or will ever be non-inclusive of the ends.

edited Jan 27 '14 at 19:43

answered Jan 27 '14 at 17:33

n0741337

2,424
2
14
15

never use range notation - it makes the trivial stuff slightly briefer but then requires a complete re-write and/or duplication of conditions (as in this case) when things get even slightly more complicated. – Ed Morton Jan 27 '14 at 17:59

split file into multiple files based upon differing start and end delimiter

3 Answers3