0
<html>
    <head><title>bla bla</title></head>
    <body>
    <div id="mainContent" xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
        bla bla .....
    </div>
    </body>
</html>

I need to extract that division. How can I do it using PHP 5?

The html source is not currectly formatted. There are some undefined attributes.

Cœur
  • 34,719
  • 24
  • 185
  • 251
shibly
  • 11,690
  • 35
  • 99
  • 163

2 Answers2

1

If your HTML is not well formed, you can still use stuff like DOMDocument, e.g.:

$d = new DOMDocument;
$d->loadHTML($htmlstring);

$x = new DomXPath($d);

foreach ($x->query('//div[@id="mainContent"]') as $node) {
    echo $node->nodeValue;
}

Alternatively, just prefix the HTML with <!DOCTYPE html> so that you can use getElementById as per normal.

Ja͢ck
  • 166,373
  • 34
  • 252
  • 304
0

/<div id=\"mainContent\".*?</div>/gs

http://regexr.com?30o0l if you want to capture everything from the div opening tag to the closing tag.

Jack
  • 5,550
  • 9
  • 44
  • 71
  • This will match anything tile the **last** closing tag. It will work only for this very simple example. – stema Apr 23 '12 at 08:54