68

How to extract text from the PDF document using PHP?

(I can't use other tools, I don't have root access)

I've found some functions working for plain text, but they don't handle well Unicode characters:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

Sfisioza
  • 3,670
  • 5
  • 39
  • 57
  • link doesn't work! please rectify! – cwiggo Nov 25 '12 at 20:12
  • 25
    Don't see why this question is considered off-topic as it is very useful, even if it may attract 'opinionated' answers, it is always better to see different points of views. Has a lot of hits too. – user3574492 Jun 04 '15 at 23:54

1 Answers1

54

Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)

Code:

include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf'); 
$a->decodePDF();
echo $a->output(); 

  • class.pdf2text.php Project Home

  • pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser


Pedro Lobito
  • 85,689
  • 29
  • 230
  • 253