7

I've looked through the other Stackoverflow questions on this topic and none of the solutions provided there seem to work for me.

I have an html page (scraped with file_get_contents()) and in that html is a div with an id of "main" - I need to get the contents of that div with PHP's DOMDocument, or something similiar. For this situation I can't use the SimpleHTMLDom parser, which complicates things a bit.

hakre
  • 184,866
  • 48
  • 414
  • 792
Charles Zink
  • 3,372
  • 4
  • 21
  • 24
  • When you say *I need to get the contents of that div* do you mean a the HTML? – alex Jun 20 '11 at 01:03
  • [DOMElement getElementById ( string $elementId )](http://php.net/manual/en/class.domdocument.php) – Ibu Jun 20 '11 at 01:04

2 Answers2

7

DOMDocument + XPath variation:

$xml = new DOMDocument();
$xml->loadHtml($temp);
$xpath = new DOMXPath($xml);

$html = '';
foreach ($xpath->query('//div[@id="main"]/*') as $node)
{
    $html .= $xml->saveXML($node);
}

If you're looking for innerHTML() (PHP DOMDocument Reference Question) - instead of innerXML() as in this answer - the xpath related variant is given in this answer.

Here the adoption with the changes underlined:

$html = '';
foreach ($xpath->query('//div[@id="main"]/node()') as $node)
                                          ######
{
    $html .= $xml->saveHTML($node);
                       ####
}
Community
  • 1
  • 1
hakre
  • 184,866
  • 48
  • 414
  • 792
3

Using DOMDocument...

$dom = new DOMDocument;

$dom->loadHTML($html);

$main = $dom->getElementById('main');

To get the serialised HTML...

html = '';
foreach($main->childNodes as $node) {
    $html .= $dom->saveXML($node, LIBXML_NOEMPTYTAG);
}

Use saveHTML() if your PHP version supports it.

alex
  • 460,746
  • 196
  • 858
  • 974