11

I'm using GDAL/OGR's ogr2ogr command line tool to export data from a PostGIS-enabled PostgreSQL database to various GIS file formats, Shapefile amongst them. To create a shapefile with the default encoding (ISO8859-1 a.k.a. latin1, see Which character encoding is used by the DBF file in shapefiles?), I'm using a command line like this:

ogr2ogr \
    -f 'Esri Shapefile' \
    $OUTPUT_PATH \
    -t_srs $OUT_SRS \
    "PG:dbname=${DB_NAME} user=${DB_USERNAME} password=${DB_PASSWORD} schemas=${SCHEMA_NAME}"

But in my data, there may be features with arbitrary languages and scripts in the attribute values. I'd like to preserve these values and thus think that the *.dbf file of the exported Shapefile should be UTF-8 encoded. (The attribute names are guaranteed to be within 7-bit ASCII.)

How can I get ogr2ogr to write a UTF-8 encoded *.dbf file when exporting to Shapefile? The documentation of the GDAL/OGR "ESRI Shapefile / DBF" driver is explicitly ambigous (sic!) about the ENCODING option in the "Layer Creation Options":

The default value is "LDID/87". It is not clear what other values may be appropriate.

And will the used encoding be indicated in the *.dbf file itself or in an accompanying *.cpg file? (I guess the latter, as UTF-8 probably isn't per se a valid (dBASE DBMS) DBF encoding.) If the latter, will ogr2ogr create the *.cpg file or do I have to create it manually?

Taras
  • 32,823
  • 4
  • 66
  • 137
das-g
  • 1,445
  • 2
  • 11
  • 33
  • 4
    If this works ogr2ogr output.shp input -lco ENCODING=UTF-8 then your question is a duplicate of http://gis.stackexchange.com/questions/15912/how-to-encode-shapefiles-from-latin1-to-utf-8. – user30184 Nov 04 '16 at 11:15
  • 1
    That works. Thanks, @user30184! But while the answer seems to be the same and the questions certainly overlap, I don't agree that the questions are duplicates of each other. (Just from the question title, mine is a bit more general: http://gis.stackexchange.com/q/15912/51574 just asks for a Shapefile-to-Shapefile conversion, while my question doesn't care (too much) about the input source and focuses on the output.) – das-g Nov 07 '16 at 16:15

1 Answers1

17

How can I get ogr2ogr to write a UTF-8 encoded *.dbf file when exporting to Shapefile?

Similar to How to encode shapefiles from LATIN1 to UTF-8?, this is possible with -lco ENCODING=UTF-8. So for my case

ogr2ogr \
    -f 'Esri Shapefile' \
    $OUTPUT_PATH \
    -t_srs $OUT_SRS \
    "PG:dbname=${DB_NAME} user=${DB_USERNAME} password=${DB_PASSWORD} schemas=${SCHEMA_NAME}" \
    -lco ENCODING=UTF-8

And will the used encoding be indicated in the *.dbf file itself or in an accompanying *.cpg file? (I guess the latter, as UTF-8 probably isn't per se a valid (dBASE DBMS) DBF encoding.)

The latter, indeed: In *.cpg files (one per database table), each just having the content

UTF-8

If the latter, will ogr2ogr create the *.cpg file or do I have to create it manually?

ogr2ogr will create the *.cpg files for you.

das-g
  • 1,445
  • 2
  • 11
  • 33