I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.
Asked
Active
Viewed 8,790 times
4 Answers
3
Find some inspiration in this post How can I search a word in a Word 2007 .docx file?
Community
- 1
- 1
Fredrik Pihl
- 42,950
- 7
- 81
- 128
2
You can use the python module docx2txt for extracting text as well as images from docx files
Ankush Shah
- 888
- 7
- 13
-2
document =docx.Document(filepath)
for image in document.inline_shapes:
print (image.width, image.height)
Try this it will work.
André Kool
- 4,668
- 12
- 33
- 43
Sathish Kumar MK
- 1
- 1