2

I have this document which is .doc extension and contains information in tables. is there any method in python so that i can copy all data into a text file(.txt)

Motti
  • 105,704
  • 46
  • 182
  • 255

1 Answers1

-1

It was duplicated one I just integrate all answers to one place

For Linux users Using textract library, which is not in windows

import textract
text = textract.process("path/to/file.extension")
text = text.decode("utf-8") 

For windows user, If users know the encoding

from bs4 import BeautifulSoup as bs
soup = bs(open(filename).read())
[s.extract() for s in soup(['style', 'script'])]
tmpText = soup.get_text()
text = "".join("".join(tmpText.split('\t')).split('\n')).encode('utf-8').strip()
print text

Only for Windows users

import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.visible = False
wb = word.Documents.Open("myfile.doc")
doc = word.ActiveDocument
print(doc.Range().Text)
RCvaram
  • 3,901
  • 3
  • 16
  • 32