2

i have a file that i need split into multiple files, and need it done via separate start and end delimiters.

for example, if i have the following file:

abcdef
START
ghijklm
nopqrst
END
uvwxyz
START
abcdef
ghijklm
nopqrs
END
START
tuvwxyz
END

i need 3 separate files of:

file1

START
ghijklm
nopqrst
END

file2

START
abcdef
ghijklm
nopqrs
END

file3

START
tuvwxyz
END

i found this link which showed how to do it with a starting delimiter, but i also need an ending delimiter. i have tried this using some regex in the awk command, but am not getting the result that i want. i don't quite understand how to get awk to be 'lazy' or 'non greedy', so that i can get it to pull apart the file correctly.

i really like the awk solution. something similar would be fantastic (i am reposting the solution here so you don't have to click through:

awk '/DELIMITER_HERE/{n++}{print >"out" n ".txt" }' input_file.txt

any help is appreciated.

Community
  • 1
  • 1
jasonmclose
  • 1,597
  • 4
  • 21
  • 36

3 Answers3

4

You can use this awk command:

awk '/^START/{n++;w=1} n&&w{print >"out" n ".txt"} /^END/{w=0}' input_file.txt
anubhava
  • 713,503
  • 59
  • 514
  • 593
  • i like this solution the best. it worked perfectly for me, and i only had to type in the delimiters one time for the command (the delimiters are much, much longer than my START and END delimiters used in the example). Thank you. – jasonmclose Jan 27 '14 at 18:39
  • 1
    using n&&w is an amazing trick and it did all the job. Nice One. – Obsidian Aug 28 '18 at 20:08
4
awk '
    /START/ {p = 1; n++; file = "file" n}
    p { print > file }
    /END/ {p = 0}
' filename
glenn jackman
  • 223,850
  • 36
  • 205
  • 328
1

Here's another example using range notation:

awk '/START/,/END/ {if(/START/) n++; print > "out" n ".txt"}' data

Or an equivalent with a different if/else syntax:

awk '/START/,/END/ {print > "out" (/START/ ? ++n : n) ".txt"}' data

Here's a version without repeating the /START/ regex after Ed Morton's comments because I just wanted to see if it would work:

awk '/START/ && ++n,/END/ {print > "out" n ".txt" }' data

The other answers are definitely better if your range is or will ever be non-inclusive of the ends.

n0741337
  • 2,424
  • 2
  • 14
  • 15
  • never use range notation - it makes the trivial stuff slightly briefer but then requires a complete re-write and/or duplication of conditions (as in this case) when things get even slightly more complicated. – Ed Morton Jan 27 '14 at 17:59