0

I am trying to extract the text from PDF files contained in a folder using texttopdf.exe (XPDF). This process generates a .txt file with all the text. There may be other file types present so I am trying to apply the recursion process only to .pdf files.

I have worked out how to get pdftotext.exe on a single file. I would like to apply this in a recursive method.

import subprocess
subprocess.run('C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\pdftotext.exe C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\files\\Sample.pdf', shell=True)

I am new to python and trying to learn the different ways of applying recusion. I suck but am gradually learning. I have been working on this for days and would appreciate your assistance as it will help me learn and assist to solve my issue.

I am using windows 10 and python 3.9.

import os
import subprocess

#This is where the PDF files are located.
entries = os.listdir('C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\files') 

#This is where pdftotext.exe is located.
pdftotext = 'C:\\Users\\Tesla\\OneDrive\\Desktop\\exiftool\\pdftotext.exe'

-# iterate over files in that directory
for entry in entries:
    if file == *.pdf; do (pdftotext) "$file"; done
    print(entry)
    print(f'Finished')

I know it looks like a mess. That is pretty much reflective of how I am at the moment.

Can anyone offer any suggestions?

quamrana
  • 33,740
  • 12
  • 54
  • 68

0 Answers0