208

I'm aware that with Boto 2 it's possible to open an S3 object as a string with: get_contents_as_string()

Is there an equivalent function in boto3 ?

taras
  • 5,922
  • 10
  • 36
  • 48
Gahl Levy
  • 2,911
  • 2
  • 11
  • 7

6 Answers6

325

read will return bytes. At least for Python 3, if you want to return a string, you have to decode using the right encoding:

import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
obj.get()['Body'].read().decode('utf-8') 
Kamil Sindi
  • 19,100
  • 18
  • 89
  • 115
  • 1
    to get this answer to work, I had to `import botocore` as `obj.get()['Body']` is of type `` – Tzunghsing David Wong Sep 29 '17 at 02:45
  • 1
    @TzunghsingDavidWong you shouldn't have to import a package to call methods on an existing object, right? Was that maybe only necessary while experimenting? – Ken Williams Oct 06 '17 at 21:49
  • 1
    what is the value of key in the obj = s3.Object(bucket,key) ** bucket is buckername?? and key is the file name???*** please correct me if i m wrong... – Amaresh Jana Nov 21 '17 at 05:19
  • 1
    @Amaresh yes, bucket = bucket name and key = filename – Tipster Jan 26 '18 at 22:55
  • 1
    if a key is pdf format , is it work ? or please suggest another useful way, I tried import textract text = textract.process('path/to/a.pdf', method='pdfminer') It will sow import error – Arun Kumar Feb 27 '18 at 05:01
  • 1
    @gatsby-lee's answer below is MUCH faster than this. I get 120mb/s vs 24mb/s – Jakobovski Nov 11 '20 at 14:43
151

I had a problem to read/parse the object from S3 because of .get() using Python 2.7 inside an AWS Lambda.

I added json to the example to show it became parsable :)

import boto3
import json

s3 = boto3.client('s3')

obj = s3.get_object(Bucket=bucket, Key=key)
j = json.loads(obj['Body'].read())

NOTE (for python 2.7): My object is all ascii, so I don't need .decode('utf-8')

NOTE (for python 3.6+): We moved to python 3.6 and discovered that read() now returns bytes so if you want to get a string out of it, you must use:

j = json.loads(obj['Body'].read().decode('utf-8'))

EvgenyKolyakov
  • 2,636
  • 1
  • 18
  • 27
83

This isn't in the boto3 documentation. This worked for me:

object.get()["Body"].read()

object being an s3 object: http://boto3.readthedocs.org/en/latest/reference/services/s3.html#object

Gahl Levy
  • 2,911
  • 2
  • 11
  • 7
  • 1
    assuming "Body" contains string data, ou can use object.get()["Body"].read() to convert to a Python string. – roehrijn Nov 24 '15 at 12:59
  • 32
    boto3 get terrible doc, as of 2016. – Andrew_1510 Feb 25 '16 at 16:50
  • 5
    http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Object.get tells us the return value is a dict, with a key "Body" of type StreamingBody, searching for that in read the docs gets you to http://botocore.readthedocs.io/en/latest/reference/response.html which will tell you to use read(). – jeffrey Apr 04 '17 at 22:52
  • 8
    seems that now `get expected at least 1 arguments, got 0`. Remove the `get()` and access the "Body" object property directly – lurscher Dec 13 '18 at 16:33
40

Python3 + Using boto3 API approach.

By using S3.Client.download_fileobj API and Python file-like object, S3 Object content can be retrieved to memory.

Since the retrieved content is bytes, in order to convert to str, it need to be decoded.

import io
import boto3

client = boto3.client('s3')
bytes_buffer = io.BytesIO()
client.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
byte_value = bytes_buffer.getvalue()
str_value = byte_value.decode() #python3, default decoding is utf-8
Gatsby Lee
  • 530
  • 5
  • 9
1

Decoding the whole object body to one string:

obj = s3.Object(bucket, key).get()
big_str = obj["Body"].read().decode("utf-8")

Decoding the object body to strings line-by-line:

obj = s3.Object(bucket, key).get()
reader = csv.reader(line.decode("utf-8") for line in obj["Body"].iter_lines())

When decoding as JSON, no need to convert to string, as json.loads accepts bytes too, since Python 3.6:

obj = s3.Object(bucket, key).get()
json.loads(obj["Body"].read())
ericbn
  • 9,027
  • 3
  • 42
  • 50
-7

If body contains a io.StringIO, you have to do like below:

object.get()['Body'].getvalue()
Pyglouthon
  • 532
  • 3
  • 15