I want to extract text from multiple PDF files at once. I have 100 pdf files from which I want to extract text and I want to do it without requiring to pass the file name as those 100 files have very unordered and different names.
I tried this but this uses file name :
import textract
import numpy
for i in range(1,3):
string_temp = 'pdf'+str(i)+'.pdf'
text = textract.process(string_temp)
string_temp1 = str(i)+'.txt'
doc_file = open(string_temp1, "w+")
doc_file.write(str(text))
doc_file.close()
Please suggest a better method, I have heard that PyPDF2 can be used for this but I dont know how.