Extract Text from Multiple PDF files at once Python

Asked Aug 25 '21 at 14:39

Active Aug 25 '21 at 14:39

Viewed 140 times

I want to extract text from multiple PDF files at once. I have 100 pdf files from which I want to extract text and I want to do it without requiring to pass the file name as those 100 files have very unordered and different names.

I tried this but this uses file name :

import textract
import numpy
for i in range(1,3):
string_temp = 'pdf'+str(i)+'.pdf'
text = textract.process(string_temp)
string_temp1 = str(i)+'.txt'
doc_file = open(string_temp1, "w+")
doc_file.write(str(text))
doc_file.close()

Please suggest a better method, I have heard that PyPDF2 can be used for this but I dont know how.

asked Aug 25 '21 at 14:39

codingx

By using `glob`, you can get filenames in specific directories. See 2nd answer of [this](https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory). – shimo Aug 26 '21 at 06:21

Extract Text from Multiple PDF files at once Python

0 Answers0