19

Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:\file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.

Eli Courtwright
  • 176,085
  • 65
  • 206
  • 255
UnkwnTech
  • 83,811
  • 65
  • 181
  • 226

4 Answers4

38

See python-docx, its official documentation is available here.

This has worked very well for me.

user.dz
  • 892
  • 2
  • 18
  • 38
Damian
  • 2,490
  • 22
  • 19
  • 3
    But it is supported .doc format, i tried but it throws me a ValueError `ValueError: file '' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml` – Shashank Feb 26 '16 at 13:04
  • It is called python-docx not python-doc, so no. :) – Damian Feb 26 '16 at 17:52
  • 5
    @Damian but the question is also about .doc files, so you should note that your answer only applies to .docx files. – oulenz Aug 31 '18 at 07:21
  • Any idea for opening .doc files then? – linello May 22 '20 at 07:39
11

If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:

Community
  • 1
  • 1
markling
  • 1,126
  • 14
  • 22
7

I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

Florian Bender
  • 159
  • 2
  • 4
auramo
  • 12,877
  • 12
  • 63
  • 88
3

doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.

fuentesjr
  • 48,512
  • 27
  • 73
  • 80