1

I'm trying to set up a daily AWS Glue job that loads data into a RDS PostgreSQL DB. But I need to truncate my tables before loading data into them, since those jobs work on the whole dataset.

To do this, I'm implementing the solution given here: https://stackoverflow.com/a/50984173/11952393.

It uses the pure Python library pg8000. I followed the guidelines in this SO, downloading the library tar, unpacking it, adding the empty __init.py__, zipping the whole think, uploading the zip file to S3 and adding the S3 URL as a Python library in the AWS Glue job config.

When I run the job, the pg8000 module seems to be imported correctly. But then I get the following error:

AttributeError: module 'pg8000' has no attribute 'connect'

I am most certainly doing something wrong... But can't find what. Any constructive feedback is welcome!

John Rotenstein
  • 203,710
  • 21
  • 304
  • 382

2 Answers2

0

Add

install_requires = ['pg8000==1.12.5']

in _setup.py file which is generating .egg file

You should able to access library.

buhtz
  • 8,057
  • 11
  • 59
  • 115
Sandeep Fatangare
  • 1,720
  • 8
  • 14
  • I've edited this answer to remove the bit about doubling percent signs as that bit isn't correct. import pg8000.dbapi c = pg8000.dbapi.connect("postgres") cursor = c.cursor() cursor.execute("select 'hello' like '%ell%';") for r in cursor.fetchall(): print(r) – Tony Locke Sep 07 '21 at 09:11
0

Here is what made it work for me.

  1. Do a pip install of the pg8000 package in a separate location

    pip install -t /tmp/ pg8000

  2. You would see 2 directories in the /tmp directory

    pg8000
    scramp
    
  3. Zip the above 2 directories separately

    cd /tmp/
    zip -r pg8000.zip pg8000/
    zip -r scramp.zip scramp/
    
  4. Upload these 2 zip files in an S3 location

  5. While creating the job or the Dev Endpoint mention these 2 zip files in the Python Library Path field

s3://<bucket>/<prefix>/pg8000.zip,s3://<bucket>/<prefix>/scramp.zip