1

Okay, I am using (PHP) file_get_contents to read some websites, these sites have only one link for facebook... after I get the entire site I will like to find the complete Url for facebook

So in some part there will be:

<a href="http://facebook.com/username" >

I wanna get http://facebook.com/username, I mean from the first (") to the last ("). Username is variable... could be username.somethingelse and I could have some attributes before or after "href".

Just in case i am not being very clear:

<a href="http://facebook.com/username" >  //I want http://facebook.com/username
<a href="http://www.facebook.com/username" >  //I want http://www.facebook.com/username
<a class="value" href="http://facebook.com/username. some" attr="value" >  //I want http://facebook.com/username. some

or all example above, could be with singles quotes

<a href='http://facebook.com/username' > //I want http://facebook.com/username

Thanks to all

Richard Pérez
  • 1,417
  • 3
  • 15
  • 18

2 Answers2

3

Don't use regex on HTML. It's a shotgun that'll blow off your leg at some point. Use DOM instead:

$dom = new DOMDocument;
$dom->loadHTML(...);
$xp = new DOMXPath($dom);

$a_tags = $xp->query("//a");
foreach($a_tags as $a) {
   echo $a->getAttribute('href');
}
Marc B
  • 348,685
  • 41
  • 398
  • 480
1

I would suggest using DOMDocument for this very purpose rather than using regex. Here is a quick code sample for your case:

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

$hrefTags = $dom->getElementsByTagName("a");
    foreach ($hrefTags as $hrefTag)
       $links[] = $hrefTag->getAttribute("href");

print_r($links); // dump all links
anubhava
  • 713,503
  • 59
  • 514
  • 593