1

I need to add rel="nofollow" to all external links (not leading to my site or its subdomains).

I have done this in two steps, at first I add rel="nofollow" to all links (even internal links) using the following regular expression:

<a href="http([s]?)://(.*?)"

Then in the second step I eliminate rel="nofollow" for internal links (my site and its subdomains) using the following regular expression:

<a href="http([s]?)://(www\.|forum\.|blog\.)mysite.com(.*?)" rel="nofollow"

How can I do this only in one step? Is it possible?

Palec
  • 11,499
  • 7
  • 57
  • 127
Ahmad
  • 504
  • 1
  • 11
  • 21
  • How about using an html parser? – Antony Jul 13 '13 at 12:04
  • Better yet, how about using the search function? Possible duplicate of [RegEx expression to find a href links and add NoFollow to them](http://stackoverflow.com/q/2450985) or [How to add rel="nofollow" to links with preg\_replace()](http://stackoverflow.com/q/5037592) – mario Jul 13 '13 at 12:09

1 Answers1

2

The DOM way:

$doc = new DOMDocument();
@$doc -> loadHTMLFile($url); // url of the html file
$links = $doc->getElementsByTagName('a');

foreach($links as $link) {
    $href = $link->getAttribute('href');
    if (preg_match('~^https?://(?>[^/m]++|m++(?!ysite.com\b))*~', $href))
        $link->setAttribute('rel', 'nofollow');
}

$doc->saveHTMLFile($url);
Casimir et Hippolyte
  • 85,718
  • 5
  • 90
  • 121