-2

There is any way to extract the content of a HTML page that starts from <body> and ends with </body> in php. If there can anyone post some sample code.

Ariful Islam
  • 7,519
  • 7
  • 33
  • 54
bharathi
  • 5,701
  • 22
  • 81
  • 142

3 Answers3

6

You should have a look at the DOMDocument reference.

This example reads a html document, creates a DOMDocument and gets the body tag:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);

$body = $dom->getElementsByTagName('body')->item(0);

echo $body->textContent; // print all the text content in the body

You should also check out the following resources:

DOM API Documentation
XPATH language specification

Cyclonecode
  • 27,619
  • 11
  • 71
  • 89
1

You can also try to use non-DOM solution based on strpos function:

$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));

stripos is case insensitive version of strpos, strripos is case insensitive 'rightmost position' version of strpos.

Hope that it will help you!

Vlada Katlinskaya
  • 951
  • 1
  • 7
  • 24
1

Try PHP Simple HTML DOM Parser

$html = file_get_html('http://www.example.com/');
$body = $html->find('body');
Naveed
  • 40,370
  • 32
  • 94
  • 130