3

I'm trying to read all shapefiles in a folder using this script:

import glob
import geopandas as gpd

infolder = r'E:/folder/test_input' shapefiles = glob.iglob(infolder+'/*.shp') gdfs = [gpd.read_file(file) for file in shapefiles]

All of the shapefiles have Chinese/Japanese characters in their names and I am getting this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 8: invalid start byte

It works fine with shapefiles of English names. How do I solve this issue?

EDIT: I am also getting this error:

UnboundLocalError: local variable 'crs_wkt' referenced before assignment

I'm not sure why as all shapefiles including English name ones are of the same coordinate system epsg:6668

hbk
  • 434
  • 3
  • 18
  • This seems to be a codepage issue with Python, which is orthagonal to EPSG codes. – Vince Apr 23 '21 at 03:22
  • You might need to encode file before you gpd.read_file(file) - your r'folder' doesn't affect the file name, and doesn't need the r. Try gpd.read_file(file.encode('utf-8')). – wingnut Apr 23 '21 at 05:39
  • @wingnut thanks for the reply. I tried what you suggested "gdfs = [gpd.read_file(file.encode('utf-8')) for file in shapefiles]" but it's returning the error "TypeError: cannot use a string pattern on a bytes-like object". Any suggestions? – hbk Apr 23 '21 at 06:35
  • Actually, that makes sense. Exactly where do you get the error? Which statement? – wingnut Apr 23 '21 at 06:58
  • 1
    See this advice. It's for Arc, but may solve the problem. https://gis.stackexchange.com/questions/3529/which-character-encoding-is-used-by-the-dbf-file-in-shapefiles – wingnut Apr 23 '21 at 07:00
  • Getting it exactly on "gdfs = [gpd.read_file(file.encode('utf-8')) for file in shapefiles]" – hbk Apr 23 '21 at 07:03
  • 1
    fiona is not able to determine the encoding of your file (and it is not UTF-8). If you know the encoding, you can pass it to read_file as gpd.read_file(file, encoding="utf-8"). I don't think file.encode will work on shapefile with multiple input files. See also https://stackoverflow.com/questions/48305400/cant-open-shape-file-with-geopandas – martinfleis Apr 23 '21 at 08:46
  • Very strangely, when I ran my original code without assigning or changing any encoding on 1 shapefile at a time it worked for all of them.. but it just didn't work when trying to iterate through all of them. Not really sure why. Thanks for pointing to that advice @wingnut, not sure I fully understand it yet, but will try to. – hbk Apr 26 '21 at 06:29
  • Thank you @martinfleis for the suggestion also, will try that in the future – hbk Apr 26 '21 at 06:29

0 Answers0