5

I am trying to load xml content from a URL containing about 60MB of data. When I do that using simple XML built in library I keep getting the following error:

PHP Warning:  DOMDocument::loadXML(): internal error: Huge input lookup in Entity, line: 845125

And the script is being stopped. What's wrong? How can I deal with this?

Sample url I use:

http://foo.com/feed.xml
user99999
  • 1,894
  • 3
  • 23
  • 41
  • Either increase you PHP memory limit, or depending on what you want to do with the XML, try a forward only reader rather than a full XML object – RiggsFolly Oct 24 '16 at 10:32
  • http://stackoverflow.com/questions/911663/parsing-huge-xml-files-in-php – Mohd Abdul Mujib Oct 24 '16 at 10:36
  • Target xml has error and browser didn't load it. – Mohammad Oct 24 '16 at 10:39
  • Use XMLReader to iterate the feed entries, expand them to DOM and use Xpath to fetch the detail for each entry. This way only a single entry node and its descendants will be loaded into memory. Here is an example: http://stackoverflow.com/a/23079179/2265374 – ThW Oct 24 '16 at 12:05

1 Answers1

2

The libxml2 changelog contains "608773 add a missing check in xmlGROW (Daniel Veillard)", which seems to be related to input buffering. Note I don't know anything about libxml2 internals, but it seems conceivable that you have tickled a 2.7.6 bug fixed in 2.7.7.

Check if the behavior is any different when you use simplexml_load_file() directly, and try setting libxml parser-related options, e.g.

simplexml_load_string($xml, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE)` Specifically, you might want to try the LIBXML_PARSEHUGE flag.

http://php.net/manual/en/libxml.constants.php XML_PARSE_HUGE flag relaxes any hardcoded limit from the parser. This affects limits like maximum depth of a document or the entity recursion, as well as limits of the size of text nodes.

Bhupinder kumar
  • 760
  • 5
  • 19