2

I am looking for recommendations on the best (hopefully) API (but I'll settle for) tool that will take PDF and return corresponding HTML. By "best", I mean most reasonably structured form (e.g., generating an HTML "heading" tag when there's arguably a heading present in the PDF representation) primarily, although usability/accessibility is also a priority. Ideally, I would like to interface with this purely at the Java API level, as opposed to Java code reading stdout of another executable. I realize this is not a trivial problem (http://discerning.com/hacks/docutils/pdf2xml/readme.html) and I'd like to understand if there's any real decent solution out there short of Adobe itself. Even in terms of Adobe, I would welcome feedback on experience with MARS or other solutions.

kvista
  • 5,042
  • 1
  • 22
  • 25
  • 1
    Maybe you can find something useful here: http://stackoverflow.com/questions/1638937/how-can-i-convert-pdf-to-html – Uros K Jan 03 '11 at 18:07
  • Please define what you mean by best. – jzd Jan 03 '11 at 18:11
  • Clarified what I mean by "best". I realize even my clarification might be subjective, but I don't want to over-complicate my question. – kvista Jan 03 '11 at 18:25
  • @genesiss: thanks for the pointer (sorry I didn't notice it earlier); unfortunately, the answers there are light and some point to no-longer-functioning things (such as Adobe's online conversion), so I'm hoping the community can bear with me re-pinging the question, albeit in a slightly different vein. – kvista Jan 03 '11 at 18:29

1 Answers1

-1

Check out the tool named iTextSharp

Carlos Valenzuela
  • 822
  • 1
  • 7
  • 19
  • Looks like it might be promising, but unfortunately it's .NET so I can't use it. Thanks for the suggestion, though. – kvista Jan 03 '11 at 18:27