0

I have a list of files that I want to detect if they are present in a subdirectory or not, I've gotten quite close but I'm stuck at the last step (number 5).

Steps Taken

  1. Get File Names From Provided Text File
  2. Save file names as a list
  3. Loop through the previously saved file name list
  4. Loop through directories and sub-directories to identify if files are present or not
  5. Save file names in the second list that are found

The provided text file has a list for example:

  • testfile1.txt
  • testfile2.txt
  • testfile3.txt
  • testfile4.txt
  • testfile5.txt

where only testfile1-4 are actually present within the (sub)directories.

Expected output is a list as ['testfile1.txt', 'testfile2.txt', 'testfile3.txt', 'testfile4.txt'] for example.

Code

import os.path
from os import path
import sys

file = sys.argv[1]
#top_dir = sys.argv[2]
cwd = os.getcwd()

with open(file, "r") as f: #Step 1
    file_list = []
    for line in f:
        file_name = line.strip()
        file_list.append(file_name) #Step 2
    print(file_list)
    for file in file_list: #Step 3
        detected_files = []
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            if file in files:
                print(file)
                print("Files Found")
                detected_files.append(file) #Step 5
                print(detected_files)

What it prints out:

Files Found
testfile1.txt
['testfile1.txt']
Files Found
testfile2.txt
['testfile2.txt']
Files Found
testfile3.txt
['testfile3.txt']
Files Found
testfile4.txt
['testfile4.txt']
Biohacker
  • 131
  • 1
  • 13
  • 1
    You should have your results in the variable `detected_files`, do you not? – Robin De Schepper Apr 21 '22 at 13:36
  • @RobinDeSchepper I've updated my question, as this does not save them all, don't know if I'm printing in the wrong space.. – Biohacker Apr 21 '22 at 13:37
  • So your problem is reduced to saving the value of `detected_files` to a file? – TDG Apr 21 '22 at 13:37
  • @TDG I just want to save the final list in (step 5) as a file to keep track of what is present or not – Biohacker Apr 21 '22 at 13:38
  • Like this one - https://stackoverflow.com/questions/899103/writing-a-list-to-a-file-with-python? – TDG Apr 21 '22 at 13:39
  • yes like this, but this only saves the last item as posted in the question "testfile4\.txt" and not everything that has been detected. – Biohacker Apr 21 '22 at 13:44
  • I would open and read the *`provided text file`* once and make a set of it, then while iterating with `.walk` use [set.intersection](https://docs.python.org/3/library/stdtypes.html#frozenset.intersection) to find the common files, *saving* the result to a container - I would probably use a set, but a list or tuple would work. – wwii Apr 21 '22 at 14:13
  • For your solution, I think `for file in file_list:` should be the inner loop and `for dir, sub_dirs, files in os.walk(cwd):` should be the outer loop. – wwii Apr 21 '22 at 14:16
  • This ignores the files in the original list – Biohacker Apr 21 '22 at 14:20
  • `detected_files = []` should be **before** the nested loops, not in it! – wwii Apr 21 '22 at 14:21
  • @wwii this ignores the conditional, thus appends everything from original list – Biohacker Apr 21 '22 at 14:24

1 Answers1

1

Your current process looks like this

with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        detected_files = []
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

You can see that on every iteration of for file in file_list: you make a new empty detected_files list - losing any information that was previously saved.

detected_files should be made once

detected_files = []
with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

I would use a set for membership testing and keep all found filenames in a set (to avoid duplicates).

detected_files = set()
with open(file, "r") as f: #Step 1
    file_list = set(line.strip() for line in f)
for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)

If you wanted you could short-circuit the process if all files are found.

for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)
    if detected_files == file_list: break
wwii
  • 21,286
  • 7
  • 34
  • 74
  • Oh wow... I didn't even realize it, to be honest, thanks a lot for pointing out something so simple. I get the results that I was looking for – Biohacker Apr 21 '22 at 14:39