I am trying to extract the text from PDF files contained in a folder using texttopdf.exe (XPDF). This process generates a .txt file with all the text. There may be other file types present so I am trying to apply the recursion process only to .pdf files.
I have worked out how to get pdftotext.exe on a single file. I would like to apply this in a recursive method.
import subprocess
subprocess.run('C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\pdftotext.exe C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\files\\Sample.pdf', shell=True)
I am new to python and trying to learn the different ways of applying recusion. I suck but am gradually learning. I have been working on this for days and would appreciate your assistance as it will help me learn and assist to solve my issue.
I am using windows 10 and python 3.9.
import os
import subprocess
#This is where the PDF files are located.
entries = os.listdir('C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\files')
#This is where pdftotext.exe is located.
pdftotext = 'C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\pdftotext.exe'
-# iterate over files in that directory
for entry in entries:
if file == *.pdf; do (pdftotext) "$file"; done
print(entry)
print(f'Finished')
I know it looks like a mess. That is pretty much reflective of how I am at the moment.
Can anyone offer any suggestions?