0

I’m using the extents of one vector to clip another, using GDAL Vector processing “Clip vector by extent” function. Both vectors are encoded as UTF-8. So why the resulting vector is encoded as ISO-8859-1, although my Linux system also uses UTF-8 as default?

I’ve seen similar questions, but my version of QGIS (3.18.1-Zürich) does not have a Ignore shapefile encoding declaration under Settings, Options, Data sources, Data source handling, nor do I find an “Encoding” option under project properties.

Rodrigo
  • 782
  • 5
  • 20
  • First, try doing it with QGIS's own Vector - Geoprocessing Tools - Clip menu option. – wingnut Apr 29 '21 at 03:52
  • @wingnut That's my first idea, but it doesn't have an “extent” feature, instead it clips by the polygon, which is much slower. – Rodrigo Apr 29 '21 at 03:57
  • Read this question. It may help. https://gis.stackexchange.com/questions/15912/how-to-encode-shapefiles-from-latin1-to-utf-8 – wingnut Apr 29 '21 at 04:06
  • Why not to save into geopackage that is always UTF-8? – user30184 Apr 29 '21 at 06:54
  • @user30184 Yeah, I may try different formats. Just thought that UTF-8 was already universal, at least in Linux. – Rodrigo Apr 29 '21 at 07:09
  • The dbf format is older than UTF-8 https://gis.stackexchange.com/questions/3529/which-character-encoding-is-used-by-the-dbf-file-in-shapefiles. It is possible to use other character sets and it should work also with QGIS but it is always complicated. Thus your issue is real but it may be better or at least faster to use some workaround if it is not essential to use shapefiles. – user30184 Apr 29 '21 at 07:24
  • @user30184 If the problem is DBF being older than UTF-8, then why does QGIS show the encoding for other shapefiles as being UTF-8? This should be no problem at all. – Rodrigo Apr 29 '21 at 14:43
  • There are three ways to tell the codepage: In the dbf file itself (LDID/codepage), with a sidecar file .cpg, or with a configuration option SHAPE_FILE https://gdal.org/drivers/vector/shapefile.html. Obviously at least one of them is used with your existing files. – user30184 Apr 29 '21 at 15:22

1 Answers1

0

According to the answer given by unicoletti, linked by wingnut in the comments, I've discovered that I can use the following option:

-lco ENCODING=UTF-8

in the Additional creation options field. However, I believe this should be the default, since we're in 2021, and the majority of the world does not speak English.

Rodrigo
  • 782
  • 5
  • 20