0

I need regular expression to be used in PHP that can extract all script tags links (src attributes).

i already have this regex which i created to extract script src values but i'm unable to make it work to find only in the head section

/<script [^>]*src=["|\']([^"|\']+(\.js))/i

hoping someone will check this and test before sending a new regex that can work.

i333
  • 13
  • 8
  • 1
    Don't parse html with regex: http://stackoverflow.com/q/3577641/372239 – Toto Mar 25 '15 at 19:49
  • Thank you for your comment, But I specifically need regex for this particular scenario and i'm aware of the limitations. thank you. – i333 Mar 26 '15 at 01:05

1 Answers1

2
/html/head/script/@src

Easy peasy. Obviously not a regex, it's xpath. Not good things tend to happen when you try to parse HTML with regular expressions. Fortunately a more capable HTML parser comes with PHP's DOM extension - exposed by the loadHTML() and loadHTMLFile() methods.

This lets you work with all the wonderful DOM methods as well as XPath for querying the document.


Example:

$html = <<<'HTML'
<html>
<head>
    <script src="foo.js"></script>
    <script src="bar.js"></script>
</head>
<body>
    <script src="baz.js"></script>
</body>
</html>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach ($xpath->query('/html/head/script/@src') as $src) {
    echo $src->value, "\n";
}

Output:

foo.js
bar.js
Community
  • 1
  • 1
user3942918
  • 24,679
  • 11
  • 53
  • 67
  • Thank you for your answer. I specifically want regex since i'm using it already in my application. So, dom parsers etc. wont work. – i333 Mar 26 '15 at 01:03