I am looking for recommendations on the best (hopefully) API (but I'll settle for) tool that will take PDF and return corresponding HTML. By "best", I mean most reasonably structured form (e.g., generating an HTML "heading" tag when there's arguably a heading present in the PDF representation) primarily, although usability/accessibility is also a priority. Ideally, I would like to interface with this purely at the Java API level, as opposed to Java code reading stdout of another executable. I realize this is not a trivial problem (http://discerning.com/hacks/docutils/pdf2xml/readme.html) and I'd like to understand if there's any real decent solution out there short of Adobe itself. Even in terms of Adobe, I would welcome feedback on experience with MARS or other solutions.
Asked
Active
Viewed 3,145 times
2
-
1Maybe you can find something useful here: http://stackoverflow.com/questions/1638937/how-can-i-convert-pdf-to-html – Uros K Jan 03 '11 at 18:07
-
Please define what you mean by best. – jzd Jan 03 '11 at 18:11
-
Clarified what I mean by "best". I realize even my clarification might be subjective, but I don't want to over-complicate my question. – kvista Jan 03 '11 at 18:25
-
@genesiss: thanks for the pointer (sorry I didn't notice it earlier); unfortunately, the answers there are light and some point to no-longer-functioning things (such as Adobe's online conversion), so I'm hoping the community can bear with me re-pinging the question, albeit in a slightly different vein. – kvista Jan 03 '11 at 18:29
1 Answers
-1
Check out the tool named iTextSharp
Carlos Valenzuela
- 822
- 1
- 7
- 19
-
Looks like it might be promising, but unfortunately it's .NET so I can't use it. Thanks for the suggestion, though. – kvista Jan 03 '11 at 18:27