1

I am working in an etl (first time), and I need to extract some files from the client's SFTP. The problem I have is that the files number is variable, so I need to check if the file exist and get it, the file format is like "file_YYYY-MM-DD-number-n" where YYYY-MM-DD is the current date and n is the number of the file, so if there are 7 files I have to look for:

  • file_2019-08-25-number-1
  • file_2019-08-25-number-2

Until now I have found that I can do something like this

cnopts = pysftp.CnOpts()
with pysftp.Connection(host=host, port=port, username=username, password=password, cnopts=cnopts) as sftp:
    files = sftp.listdir(directory)

How do I find in the files there?

Martin Prikryl
  • 167,268
  • 50
  • 405
  • 846
Carlos Salazar
  • 1,580
  • 2
  • 21
  • 42

2 Answers2

7

To check for an existence of a file with pysftp, use Connection.exists method:

with pysftp.Connection(...) as sftp:
    if sftp.exists(sftp, "file_2019-08-25-number-1"):
        print("1 exists")
    if sftp.exists(sftp, "file_2019-08-25-number-2"):
        print("2 exists")

Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.

Martin Prikryl
  • 167,268
  • 50
  • 405
  • 846
1

You can use Python's built-in re regular expression module to determine if a filename matches the general pattern you're looking for as the example immediately below does.

import re


files = [
    'file_2019-08-25-number-1',
    'foo.bar',
    'file_2019-08-25-number-2',
    'file_2018-02-28-number-42',
    'some_other_file.txt'
]

pattern = re.compile(r'file_\d{4}-\d{2}-\d{2}-number-\d+')

for filename in files:
    if pattern.match(filename):
        print(f'{filename!r} matches pattern')

Output:

'file_2019-08-25-number-11' matches pattern
'file_2019-08-25-number-2' matches pattern
'file_2018-02-28-number-42' matches pattern

If all you want to do is check for a specific filename, you could do something like this:

if filename.startswith('file_2019-08-25-number-'):
    # Do something with filename.
    ...
martineau
  • 112,593
  • 23
  • 157
  • 280