1

I have a file like the following format:

line one  
line two <% word1  %> text <% word2 %>  
line three <%word3%>  

I want to use linux shell tools like awk, sed etc to get all the words quoted in <% %>
result should be like

word1  
word2  
word3  

Thanks for help.

I forgot to mention: I am in embedded environment. grep has no -P option

alzhao
  • 59
  • 6

5 Answers5

4

With GNU awk so we can RS to multiple characters:

$ gawk -v RS='<% *| *%>' '!(NR%2)' file
word1
word2
word3

With any modern awk:

$ awk -F'<% *| *%>' '{for (i=2;i<=NF;i+=2) print $i}' file
word1
word2
word3
Ed Morton
  • 172,331
  • 17
  • 70
  • 167
2

You could do it with grep:

$ grep -oP '(?<=<%).+?(?=%>)' file
 word1  
 word2 
word3
user000001
  • 30,389
  • 12
  • 73
  • 103
2

This works for your sample:

sed -ne 's/%>/&\n/p' | sed -ne 's/.*<%\s*\(.*\)\s*%>.*/\1/p' < sample.txt

The first sed just puts a line break after every closing %>, as preparation.

The next sed extracts the relevant part within <% ... %> without leading and trailing whitespaces.

In both commands, the -n flag combined with s///p are to limit the data going through the pipe to the matching (relevant) lines only.

janos
  • 115,756
  • 24
  • 210
  • 226
  • Thanks. This works perfect. – alzhao Aug 24 '13 at 11:54
  • Just be aware there's 2 non-portable sed constructs in the above: a) use of `\n` as a newline (backslash followed by a literal carriage return is portable) and b) use of `\s` to represent a space character (`[[:blank:]]` is POSIX, but in this case a literal blank char is probably adequate). I'm surprised your sed works with those when your grep doesn't support `-P`. – Ed Morton Aug 24 '13 at 13:04
2

Using awk:

awk -F '<% *| *%>' '{for(i=2; i<=NF; i+=2) print $i}' file
word1
word2
word3
anubhava
  • 713,503
  • 59
  • 514
  • 593
0

This might work for you (GNU sed):

sed '/<%\s*/!d;s//\n/;s/[^\n]*\n//;s/\s*%>/\n/;P;D' file
potong
  • 51,370
  • 6
  • 49
  • 80