1
def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
    return last_added

Above code gets me the latest file however i only want the files ending with 'csv'

Elad Kalif
  • 9,745
  • 2
  • 10
  • 38

2 Answers2

1

Filter by suffix

If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv).

So filter the objects by key ending with .csv.

Use filter(predicate, iterable) operation with predicate as lambda testing for str.endswith(suffix):

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs)  # csv only 
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True)  # last first, sort by modified-timestamp descending

return csvs[0]

Note: To get the last-modified only

This solution alternates the sort direction using reverse=True (descending) to pick the first which will be the last modified. You can also sort default (ascending) and pick the last with [-1] as answered by Kache in your preceding question.

Simplification

From the boto3 list_objects_v2 docs about the response structure:

Contents (list) ... LastModified (datetime) -- Creation date of the object.

Boto3 returns a datetime object for LastModified. See also Getting S3 objects' last modified datetimes with boto.

So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s')) ?

Python can also sort the datetime directly.

hc_dev
  • 5,553
  • 20
  • 27
  • 1
    I like this answer, but you have obj and the lambda function swapped in the filter function. Filter function requires the first parameter to be the function that returns True/False and the second parameter to be the collection. – Danny Apr 29 '22 at 15:56
  • @Danny, thanks for spotting this. You always have to pay attention when using built-ins `filter` and `sorted` (the order of parameters is different). That's why I prefer [`list.sort()`](https://docs.python.org/3/howto/sorting.html) among others (modify in place, readability, etc.). – hc_dev May 02 '22 at 11:25
0

You can check if they end with .csv:

def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]

    return last_added
Marcin
  • 168,023
  • 10
  • 140
  • 197