11

So I have an S3 bucket of videos (several hundred), upon which I used ElasticTranscoder to transcode everything into a second, optimised bucket.

However, when I inspect my second bucket, there are 40-50 less objects, but I cannot figure out what they are (the directory structure is deeply nested etc).

How can I get the file diff of two buckets using aws s3api list-objects?

Perhaps there are files in the bucket which are not videos, which I somehow didn't know about.

j_d
  • 2,582
  • 8
  • 44
  • 84
  • Do you have a naming convention? Get all objects list from first bucket & 2nd bucket, massage the names & then get the diff?! – Mark Shehata Aug 04 '17 at 18:46

3 Answers3

23

You can use the sync command with the --dryrun option to compare instead of syncing.

aws s3 sync s3://bucket s3://bucket2 --dryrun

You can, of course, also use it to compare a local directory with a bucket.

aws s3 sync . s3://bucket2 --dryrun

21

Using Display only filenames:

aws s3 ls s3://bucket-1 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_1_files
aws s3 ls s3://bucket-2 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_2_files

diff bucket_1_files bucket_2_files
helloV
  • 46,329
  • 4
  • 117
  • 135
0

Inspired from @George comment

you can use this to extract paths list:

aws s3 sync s3://<main-bucket> s3://<second-bucket> --dryrun | awk 'match($3,"^(s3://[^/]+/)(.*)",a) {print a[2]}'

or for local paths

aws s3 sync <local-path> s3://darsak2.public --dryrun | awk 'match($3,"^(./)?(.*)",a) {print a[2]}'
Khaled AbuShqear
  • 1,042
  • 12
  • 21