Python GDAL-OSGEO library: shapefile to PostGIS?

Question

I'm working on a Python script which needs to use the gdal python module, and not the subprocess call method.

I've been using this, but I can't find any example or good documentation online about how to load a local file to PostgreSQL. Most examples I've seen are the other direction: about exporting from PG to local file. I've tried:

from osgeo import gdal
conn_string = r'PG:"host=localhost dbname=postgres port=5432 user=admin password=xxxx'
in_file = 'polygon.shp'
gdal.VectorTranslate(
        out_table, # how do I set a PostgreSQL table name as the destination? How do I pass the PG connection string in? 
        in_file,
        format = 'PostgreSQL',
        geometryType = 'POLYGON',
        accessMode = 'overwrite',
        dstSRS=27700,
        reproject=True,
        layerCreationOptions=['SCHEMA=schema_name',
                            'PRECISION=NO',
                            'OVERWRITE=YES']
        )

But get errors like ERROR 1: You must specify at least a db name or a service name or ERROR 1: PostgreSQL driver doesn't currently support database creation. (I just want to create/overwrite a table). Any tips?

what is different to using psycopg2 https://gis.stackexchange.com/a/90110/276 — Mapperz, Jul 06 '23 at 00:45
Plenty of examples at https://gdal.org/drivers/vector/pg.html Either don't specify a table and the new layer (table) will named be the same as the input layer (in your example, "polygon"), or specify it using the -nln new_layer_name option. — user2856, Jul 06 '23 at 06:21
@user2856 that link shows command line examples (I already know how to use gdal that way), I need pythonic examples which use the ‘osgeo’ library and ‘VectorTranslate’ function. — Theo F, Jul 06 '23 at 06:57
Pretty much the same. Don't specify a tsble or specify a new layer name with layerName=something option e.g. VectorTranslate(etc..., layerName="something") — user2856, Jul 06 '23 at 07:20
have you tried providing the conn_string via options https://gdal.org/api/python/osgeo.gdal.html#osgeo.gdal.VectorTranslateOptions? — PyMapr, Jul 06 '23 at 07:40
e.g. opt = gdal.VectorTranslateOptions(options=[r'PG:"host=localhost, dbname=postgres, port=5432, user=admin, password=xxxx]) and then then pass opt to gdal.VectorTranslate(). This example is untested and the expected string format in the list might be slightly different. — PyMapr, Jul 06 '23 at 08:32
@user2856 as I understand it, VectorTranslate requires 2 arguments. 1: output file/name, and 2: source dataset. So I have to specify the output name. — Theo F, Jul 06 '23 at 09:06
The connection string is the output name. As I said - the new layer (table) will named be the same as the input layer (in your example, "polygon") unless you specify a different name via layerName="something". — user2856, Jul 06 '23 at 09:13
@user2856 ok thanks, that's what confused me- the connection string being passed as the output name. I've managed to get a working script, will post here along with a fix for PROJ: proj_identify errors which occur from running osgeo module which has it's own proj.db, on a system which has a core GDAL installation elsewhere. — Theo F, Jul 06 '23 at 10:18

Theo F · Answer 1 · 2023-07-14T09:02:34.257

With some steer from @user2856 and others in the comments, I've managed to form a working script, with some very important information on proj.db.

import os
from osgeo import gdal
"""
if you have a separate system installation of GDAL, you must make sure you only use the 
proj.db of the imported osgeo python package. We can use os.environ[] to do this. 
Otherwise python will try run your code using a different proj.db, which can produce errors like:
ERROR 1: PROJ: proj_identify: C:\Program Files\GDAL\projlib\proj.db lacks DATABASE.LAYOUT.VERSION.MAJOR / DATABASE.LAYOUT.VERSION.MINOR metadata. It comes from another PROJ installation.
"""
os.environ['PROJ_LIB'] = r'C:&lt;user>\AppData\Roaming\Python\Python39\site-packages\osgeo\data\proj'
conn_string = r'dbname=database host=localhost port=5432 user=username password=xxxx'
gdal.VectorTranslate(
    destNameOrDestDS=f'PG:{conn_string}',
    srcDS=r'folder\polygon.shp',
    layerName='polygon_table',  # destination table name
    accessMode='overwrite',
    geometryType='POLYGON',
    dstSRS=f'EPSG:27700',
    layerCreationOptions=['SCHEMA=schema_name',  # the destination schema
                          'PRECISION=NO',  # prevents field overflow
                          'OVERWRITE=YES',
                          'GEOMETRY_NAME=geom']  # name for geometry column (default is wkb_geometry)
)

The proj.db issue is very important if you're running these kind of scripts on a cron machine/scheduler. If you are, make sure the PROJ_LIB environment is set to somewhere your scheduler can reach (ie. C:\Program Files instead of C:<user docs>)

Python GDAL-OSGEO library: shapefile to PostGIS?

1 Answers1