3

I'm on a Linux server and I need to convert MS Word 97-2003 .doc format to plain text .txt files using PHP

I already tried this solutions:

How to extract text from word file .doc,docx,.xlsx,.pptx php

Extract text from doc and docx

But both are just working fine for .docx format.

The issue is when I convert files, I got scrap characters at the end of the text. The length of the chars I don't need vary depending on the length of the file. Also, it may happen that if the file is a bit long, it get truncated.

Is there any simple way to get this converted?

Community
  • 1
  • 1
Dario Emerson
  • 123
  • 4
  • 13

2 Answers2

0

I've lastly come to use the following solution, launching Antiword:

private function doc() {
    $file = escapeshellarg($this->filename);
    $text = `/usr/sbin/antiword -w 0 $file`;
    return html_entity_decode(utf8_encode(trim($text)));
}
Dario Emerson
  • 123
  • 4
  • 13
-2

I answer my own question. After a bit of search, I found out this lib from iFile: http://www.isapp.it/ifile/it/APIDocument_v1.2/ifile/adapter-helpers/_adapter---helpers---class.doc2txt.php.html

It's actually working very well for both .doc and .rtf

Dario Emerson
  • 123
  • 4
  • 13