Linux shell, get all the matches from a file

Question

I have a file like the following format:

line one  
line two <% word1  %> text <% word2 %>  
line three <%word3%>

I want to use linux shell tools like awk, sed etc to get all the words quoted in <% %>
result should be like

word1  
word2  
word3

Thanks for help.

I forgot to mention: I am in embedded environment. grep has no -P option

score 4 · Answer 1 · answered Aug 24 '13 at 12:34

4

With GNU awk so we can RS to multiple characters:

$ gawk -v RS='<% *| *%>' '!(NR%2)' file
word1
word2
word3

With any modern awk:

$ awk -F'<% *| *%>' '{for (i=2;i<=NF;i+=2) print $i}' file
word1
word2
word3

answered Aug 24 '13 at 12:34

Ed Morton

score 2 · Answer 2 · answered Aug 24 '13 at 10:14

2

You could do it with grep:

$ grep -oP '(?<=<%).+?(?=%>)' file
 word1  
 word2 
word3

answered Aug 24 '13 at 10:14

user000001

Thanks! Forget to mention, I am in embedded environment. grep has no -P option – alzhao Aug 24 '13 at 10:36

score 2 · Answer 3 · answered Aug 24 '13 at 11:30

2

This works for your sample:

sed -ne 's/%>/&\n/p' | sed -ne 's/.*<%\s*\(.*\)\s*%>.*/\1/p' < sample.txt

The first sed just puts a line break after every closing %>, as preparation.

The next sed extracts the relevant part within <% ... %> without leading and trailing whitespaces.

In both commands, the -n flag combined with s///p are to limit the data going through the pipe to the matching (relevant) lines only.

answered Aug 24 '13 at 11:30

janos

Thanks. This works perfect. – alzhao Aug 24 '13 at 11:54
Just be aware there's 2 non-portable sed constructs in the above: a) use of `\n` as a newline (backslash followed by a literal carriage return is portable) and b) use of `\s` to represent a space character (`[[:blank:]]` is POSIX, but in this case a literal blank char is probably adequate). I'm surprised your sed works with those when your grep doesn't support `-P`. – Ed Morton Aug 24 '13 at 13:04

anubhava · Accepted Answer · 2013-08-24T13:39:31.463

2

Using awk:

awk -F '<% *| *%>' '{for(i=2; i<=NF; i+=2) print $i}' file
word1
word2
word3

edited Aug 24 '13 at 13:39

answered Aug 24 '13 at 12:10

anubhava

score 0 · Answer 5 · answered Aug 24 '13 at 19:56

0

This might work for you (GNU sed):

sed '/<%\s*/!d;s//\n/;s/[^\n]*\n//;s/\s*%>/\n/;P;D' file

answered Aug 24 '13 at 19:56

potong

5 Answers5