1

I saw in this forum an answare close to my "request" but not enough (Regexp to capture string between delimiters).

My question is: I have an HTML page and I would get only the src of all "img" tags of this page and put them in one array without using cheerio (I'm using node js).

The problem is that i would prefer to exclude the delimiters. How could i resolve this problem?

budi
  • 5,903
  • 8
  • 51
  • 77

1 Answers1

0

Yes this is possible with regex, but it would be much easier (and probably faster but don't quote me on that) to use a native DOM method. Let's start with the regex approach. We can use a capture group to easily parse the src of an img tag:

var html = `test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >`;
var srcs = [];
html.replace(/<img[^<>]*src=['"](.*?)['"][^<>]*>/gm, (m, $1) => { srcs.push($1) })

console.log(srcs);

However, the better way would be to use getElementsByTagName:
(note the following will get some kind of parent domain url since the srcs are relative/fake but you get the idea)

var srcs = [].slice.call(document.getElementsByTagName('img')).map(img => img.src);

console.log(srcs);
test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >
Damon
  • 3,886
  • 2
  • 15
  • 26
  • Ty very much at all. I'm not using cheerio because i think it's much slower. I get an HTML page with request module then i just want to extract all the src of every images. – Davide Modesto Jun 17 '17 at 07:27