How to fix "module 'pg8000' has no attribute 'connect'" error in AWS Glue job

Question

I'm trying to set up a daily AWS Glue job that loads data into a RDS PostgreSQL DB. But I need to truncate my tables before loading data into them, since those jobs work on the whole dataset.

To do this, I'm implementing the solution given here: https://stackoverflow.com/a/50984173/11952393.

It uses the pure Python library pg8000. I followed the guidelines in this SO, downloading the library tar, unpacking it, adding the empty __init.py__, zipping the whole think, uploading the zip file to S3 and adding the S3 URL as a Python library in the AWS Glue job config.

When I run the job, the pg8000 module seems to be imported correctly. But then I get the following error:

AttributeError: module 'pg8000' has no attribute 'connect'

I am most certainly doing something wrong... But can't find what. Any constructive feedback is welcome!

Is it spark or python shell job? – Sandeep Fatangare Aug 20 '19 at 19:09 — Sandeep Fatangare, Aug 20 '19 at 19:09
Sandeep, it is a Python Spark job. – Jean-Baptiste Aug 21 '19 at 05:51 — Jean-Baptiste, Aug 21 '19 at 05:51

score 0 · Answer 1 · edited Sep 07 '21 at 14:33

0

Add

install_requires = ['pg8000==1.12.5']

in _setup.py file which is generating .egg file

You should able to access library.

edited Sep 07 '21 at 14:33

buhtz

8,057
11
59
115

answered Aug 21 '19 at 16:39

Sandeep Fatangare

1,720
8
14

I've edited this answer to remove the bit about doubling percent signs as that bit isn't correct. import pg8000.dbapi c = pg8000.dbapi.connect("postgres") cursor = c.cursor() cursor.execute("select 'hello' like '%ell%';") for r in cursor.fetchall(): print(r) – Tony Locke Sep 07 '21 at 09:11

score 0 · Answer 2 · answered Jun 03 '20 at 15:38

Here is what made it work for me.

Do a pip install of the pg8000 package in a separate location

pip install -t /tmp/ pg8000
You would see 2 directories in the /tmp directory
```
pg8000
scramp
```

Zip the above 2 directories separately

cd /tmp/
zip -r pg8000.zip pg8000/
zip -r scramp.zip scramp/

Upload these 2 zip files in an S3 location
While creating the job or the Dev Endpoint mention these 2 zip files in the Python Library Path field

s3://<bucket>/<prefix>/pg8000.zip,s3://<bucket>/<prefix>/scramp.zip

How to fix "module 'pg8000' has no attribute 'connect'" error in AWS Glue job

2 Answers2