-3

I am using stax to parse an XML containg HTML and custom tags in java.

The XML looks like this

<html><div>Hello World</div><div><br /></div>
<div><br />
<Resource type="audio/m4a" height="72.00" id="lh6rde3c1d39148804cea99b054f4cc4bb990" width="72.00" />
<br /><br /></div>
<div><br />
</div><div>asfasdfasdfasdf</div><div><br /></div><div><br /></div><div><b>asdfasdfasdfasdf</b></div>
<div>
<b>adsfasdfasdf</b>
</div><div><b><br /></b></div><div><b><i>sdfasdfasdfas</i></b></div><div><i><b>asdfasdfasdfasdf</b>asdfasdfasdfasdf</i>
</div>
<Resource type="video/mp4" height="72.00" id="lh6rde3c1d39148804cesdfd2454f4cc4bb990" width="72.00" />
<div><i>asdfasdfasdfasdfasdf</i></div>
<div><ol><li><i>one</i></li><li><i>wto</i></li><li><i>three</i></li></ol><div>
<i>
asdfasdfasdfasdf</i>
</div><div>
<ul><li><i>one </i></li><li><i>thwo</i></li><li><i>three</i></li></ul></div>
</div></html>

I only require the resource details(i.e the attributes) Is there any other better option available in terms of parsing speed.

Linesh Mohan
  • 79
  • 3
  • 9

1 Answers1

0

This question is excessively broad, so I had to downvote it. I have no idea what the circumstances of your XML interpretation are, so this answer will be limited.

However, I can tell you that classically SAX and JAXP have been used; they don't strictly require a DTD, and with some clever enumerations you can parse just about anything.

JSoup, as mentioned by Rafael Cardoso, is generally an HTML parser, not an HTML-in-XML parser; but it may work for you. If all you're looking for are the attributes to a specific tag, along with (presumably) associated data, then the JDK may have all that you need.

We also have JDOM, DOM4J, and a bunch of others, all of which have their strengths and weaknesses. This question, thus, isn't particularly constructive, and is basically a duplicate of this one; which you might take a look at.

I recommend looking at this tutorial; which explains how to build a parser with the standard library.

In the future, if possible please specify the conditions that your program is operating under, provide us with an objective and clearly defined question, and research Stack Overflow a little more thoroughly first. All the same, I hope this does it for you. Good luck!

Community
  • 1
  • 1
Michael Macha
  • 1,651
  • 1
  • 16
  • 25