1

I'm using QGIS 2.18 on a Mac (Sierra).

I'm doing some pretty extensive data manipulation using a virtual column, and using the function editor (so Python) to do so: one of my functions, for example, is to change street-name abbreviations to long-format (e.g.: All => Allée), and in French.

The result is encoded in UTF-8, and I can't seem to find a means to translate it into whatever encoding is needed to display it correctly on a map... The result of any string containing an accented character is 'null' in the attributes table.

Yet in another 'street name' layer, pulled directly from a PostGIS database, even accented labels (street names) display correctly.

Is this a common problem, and is there any way around this?

#!/usr/bin/env python
# -*- coding: utf-8 -*-


from qgis.core import *
from qgis.gui import *
from qgis.utils import qgsfunction
import re

@qgsfunction(args='auto', group='Custom')
def streettype_format(input, feature, parent):
    rep = {
        "CITE":"Cité",
        "ALL":"Allée",
        "RUE":"Rue",
        "RPT":"Rond-point",
        "SENT":"Sentier",
        "VLA":"Villa",
        "TERR":"Terrace",
        "CHEM":"Chemin",
        "CAR":"Carrefour",
        "HAM":"Hameau",
        "BD":"Boulevard",
        "AV":"Avenue",
        "CRS":"Cours",
        "VOIE":"Voie",
        "CHAU":"Chaussée",
        "ARC":"Arcade",
        "GAL":"Galérie"
    }
    rep = dict((re.escape(k), v) for k, v in rep.iteritems())
    pattern = re.compile("|".join(rep.keys()))
    input = pattern.sub(lambda m: rep[re.escape(m.group(0))], input)
    return input

Also: from some d*cking around in the Python console:

>>>> import sys
>>>>sys.getdefaultencoding()
ascii
>>>>u"Héy"
u'H\xe9y'
>>>>"Héy"
'H\xc3\xa9y'
  • Can you give a sample of your code please. You can set your codepage in a python script https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python and https://stackoverflow.com/questions/6179617/set-python-terminal-encoding-on-windows about windows but should have some relevance in Mac. – Michael Stimson Jun 26 '17 at 21:46
  • Of course, I'll add it to the main question... hope it's not too long. Thanks! – Josef M. Schomburg Jun 26 '17 at 23:39
  • Read the 2nd hyperlink. The # -- coding: utf-8 -- sets the script encoding not the output encoding. Your string ' Allée' returns u' All\xe9e' with decode('UTF-8'), which when printed looks ok. I'm monolingual so not an expert in character sets other than ASCII but it seems to me that your input strings need to be defined as unicode or your output encoded/decoded. Sorry I can't put my finger on the exact method, you will have to try (with a shorter list) unicode, encoding and decoding and then when you've got it working post your code as an answer - I would be very interested in seeing it – Michael Stimson Jun 27 '17 at 00:11
  • 1
    Have you tried prepending a "u" (coming from Unicode) to strings containing special characters? For instance: rep = { "CITE": u"Cité", "ALL": u"Allée", ... } – Germán Carrillo Jun 27 '17 at 01:36
  • Yes, I have tried adding a 'u'... no dice, either. Michael Stimson is right: the code is executing fine internally as utf-8, but it's getting it back to the display encoding that seems to be the obstacle. – Josef M. Schomburg Jun 27 '17 at 06:16

2 Answers2

1

Working, but only the bottom solution: Please see below.

// FALSE SOLUTION

I found a solution... I would like to know more about the how and why of -why- it works, but it works. Germán Carrillo's "add a 'u' prefix" idea set me in that direction, so thanks.

Changing:

return input

at the end of my function to:

return u.input

...did the trick.

PS: just to follow that rabbit hole all the way, I eliminated the 'u's I had added to the dictionary's accented character strings... still works, so they weren't required there. Again, as long as the characters remain 'internal', they're okay (treated as utf-8).

Hypothesising: this must mean that Qgis has no means of determining a custom function output's encoding (or it doesn't even try), and the added 'u' gave it the information it needed... which means that Qgis falls back to 'ascii' when that information is not there.

// additional roadblock-variables

Something in that worked 'temporarily', but after a while the entire function returned 'NULL', and I got a (quite logical, actually) "global variable 'u' undeclared" error. With the additional 'unexplainable' environment behaviour, I understand even less. This is maddening.

// REAL SOLUTION

This is the code that worked: I eliminated all possible 'reserved word' conflicts, and expanded my virtual column to an 'unlimited' text type (and -this- might have been the problem: with four (at least) characters to each letter, utf-8 encoding may have gone beyond the 50-char limit I had initially set).

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Define new functions using @qgsfunction. feature and parent must always be the
last args. Use args=-1 to pass a list of values as arguments
"""
from qgis.core import *
from qgis.gui import *
from qgis.utils import qgsfunction
import re

@qgsfunction(args='auto', group='Custom')
def streettype_format(shortvers, feature, parent):
    rep = {
        "CHAU":u"Chaussée",
        "CITE":u"Cité",
        "SQ":"Square",
        "PROM":"Promenade",
        "GAV":"Grande avenue",
        "PL":"Place",
        "PER":"Passerelle",
        "COUR":"Cour",
        "RTE":"Route",
        "PAS":"Passage",
        "PONT":"Pont",
        "RLE":"Ruelle",
        "PRT":"Porte",
        "IMP":"Impasse",
        "SOUT":"Souterrain",
        "QU":"Quai",
        "ESPL":"Esplanade",
        "ALL":u"Allée",
        "RUE":"Rue",
        "RPT":"Rond-point",
        "SENT":"Sentier",
        "VLA":"Villa",
        "TERR":"Terrace",
        "CHEM":"Chemin",
        "CAR":"Carrefour",
        "HAM":"Hameau",
        "BD":"Boulevard",
        "AV":"Avenue",
        "CRS":"Cours",
        "VOIE":"Voie",
        "ARC":"Arcade",
        "GAL":u"Galérie",
        "PARV":"Parvis"
    }
    rep = dict((re.escape(k), v) for k, v in rep.iteritems())
    pattern = re.compile("|".join(rep.keys()))
    longvers = pattern.sub(lambda m: rep[re.escape(m.group(0))], shortvers)
    return longvers

BUT: the 'return u.output' thing I did earlier -did- work -before- I changed the field's character length limit.

Confusing.

0

Did you try this?

Settings->Options->Advanced->"I will be careful, I promise!" -> Processing -> encoding -> Set "Value" to UTF-8

I think this was the fix to a similar problem I had on Windows. Not sure though... I tried so many things. Encoding problems is a pain.

TurboGraphxBeige
  • 1,467
  • 10
  • 24
  • Thanks, but the setting there is 'System', and the default encoding for mac has been utf-8 since around... a decade now, I think.

    I tried changing it all the same, but no dice.

    – Josef M. Schomburg Jun 27 '17 at 05:31