4

Is there some way to differentiate XML from HTML with PHP DomDocument?

I looked in the docs and didn't find anything.

I'm looking for a function like check($string) that returns 'is XML' or 'is HTML' for each $string.

These similar questions here in SO didn't help me.

Community
  • 1
  • 1
James
  • 1,710
  • 2
  • 35
  • 57
  • 1
    i don't think so! because HTML is a type of XML! so they are same! you have to find a solution by checking code , mine type or etc.. – CyC0der Aug 05 '15 at 19:44
  • 1
    @CyC0der: No, HTML is not a type of XML. XHTML is, but not HTML. – hakre Aug 05 '15 at 22:48

2 Answers2

2

There is no such function, but you can rest assured that some $string is well-formed XML when DOMDocument::loadXML() returned true (set recover to false). A HTML document fails with that.

For HTML you can use DOMDocument::loadHTML() to check if a document can be loaded as HTML. HTML is not as strict as XML.

hakre
  • 184,866
  • 48
  • 414
  • 792
  • Thx @hakre. It looks right but the code `$dom = new DOMDocument(); $var = $dom->loadXML("Test"); print_r ($var);die();` returns 1. whats is wrong? – James Aug 07 '15 at 19:16
  • It should return ``bool(true)`, see here: https://eval.in/413856 - And that's fine as the string *is* well-formed XML. – hakre Aug 07 '15 at 19:21
  • Actually, you're right. I did not notice that the string is an well-formed XML. I made a test with other HTML and works like a charm returning `bool(false)` – James Aug 07 '15 at 19:30
  • 1
    It can be that a HTML document is well-formed XML. In that case you perhaps want to also check if the `->documentElement` field's `DOMElement::$tagName` is "`html`". Compare case-insensitive. It would be a strong signal that this is a HTML document. – hakre Aug 07 '15 at 20:46
0

Use preg_match extension. Example:

if( preg_match('/<html[^>]*>/', $string) ) {
{
  // ... actions for XML ...
} elseif( preg_match('/<\?xml[^?]*\?>/', $string) ) {
  // ... actions for HTML ...
} else {
  // ... actions for another ...
}
Quazer
  • 195
  • 8