0

I have an file with this html code inside:

 <p class="center-block"><img alt="ourpicture" class="picture" src="http://mypage.com/ourpicture123" /></p>

Now I would like to get just the source like http://mypage.com/ourpicture123. How can I handle this problem with sed? It would be great if I can look for 'src="' before and '"' after.

  • possible duplicate of [Easiest way to extract the urls from an html page using sed or awk only](http://stackoverflow.com/questions/1881237/easiest-way-to-extract-the-urls-from-an-html-page-using-sed-or-awk-only) – NeronLeVelu Feb 10 '15 at 11:35

2 Answers2

0

Through sed,

$ sed -n 's/.*\bsrc="\([^"]*\)".*/\1/p' file
http://mypage.com/ourpicture123

Through grep,

grep -oP '\bsrc="\K[^"]*(?=")' file

The above sed command won't work if a line contains more than one src attribute present on a line. \K in the above grep command would discard the previously matched src=" characters from printing at the final.

Avinash Raj
  • 166,785
  • 24
  • 204
  • 249
0

Here is an awk version:

awk -F'src="' '{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123

Or like this:

awk -F'src="' '{sub(/".*$/,"",$2);print $2}' file
http://mypage.com/ourpicture123

If you have several lines, and only needs line with src= do:

awk -F'src="' 'NF>1{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123
Jotne
  • 39,326
  • 11
  • 49
  • 54