UnicodeEncodeError on Linux but not on Windows

Question

I'm getting an UnicodeEncodeError: 'ascii' codec can't encode exception when I try to print a Unicode string on Linux. On Windows I do not get the error.

The code executed on Linux:

    my_str = u'\u4ece\u5165\u5e93'
    print "%r"  % my_str #output: u'\u4ece\u5165\u5e93' 
    print "%s" % my_str #output: UnicodeEncodeError: 'ascii' codec can't encode character u'\u4ece' in position 0: ordinal not in range(128)

On Windows I get:

    my_str = u'\u4ece\u5165\u5e93'
    print "%r"  % my_str #output: u'\u4ece\u5165\u5e93' 
    print "%s" % my_str #output: 从入库

What is the value of `import sys; sys.stdout.encoding`? `print` must encode Unicode values to the locale of your terminal or pipe. — Martijn Pieters, Feb 19 '16 at 10:01
utf-8 ,and I find that if I just python thefile.py it can work well，but If I use it with other people project，it will be get error — Helpme123, Feb 21 '16 at 09:27
Then please provide a *reproducable example*. It sounds as if you are using a pipe or subprocess, in which case the default is ASCII. Set the `PYTHONIOENCODING` environment variable to override. — Martijn Pieters, Feb 21 '16 at 09:38

score 7 · Answer 1 · answered Feb 19 '16 at 15:31

7

It's very likely that your locale and/or environment is broken, not installed, not set or set to C. Python uses the locale settings to apply the correct encoder on stdout. This allows Unicodes to be encoded to the appropriate encoding.

If you're running Python from the command line, make sure your locale is healthy. Type locale and your should see something like:

 $ locale
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
 $

If you see error messages or if LANG=C or similar, Python will use an ASCII encoder, which rejects non-ASCII characters.

To find the locales installed on your system, type locale -a. Select the appropriate locale, ideally one ending in "UTF-8", and set LANG accordingly. E.g.

LANG=en_GB.UTF-8

The run locale again and check for errors. If you still get errors then you will need to research how to rebuild your locales for your distribution.

If you're running within an IDE or you're unable to fix your then you may have success with adding the following environment variable to your shell or IDE run configuration:

export PYTHONIOENCODING=utf-8

This tells Python to ignore the locale and apply a UTF-8 encoder to stdout.

You can validate what Python is using for the locale by using the locale module in Python. My healthy locale returns:

>>> import locale
>>> locale.getdefaultlocale()
('en_GB', 'UTF-8')
>>> locale.getpreferredencoding()
'UTF-8'

An unhealthy locale will return US-ASCII for locale.getpreferredencoding()

answered Feb 19 '16 at 15:31

Alastair McCormack

25,004
7
71
96

thanks， locale.getdefaultlocale() output-> ('en_US', 'UTF8') – Helpme123 Feb 21 '16 at 09:30
Your locale is broken. It should be `UTF-8` with a hyphen and match the definition in `locale -a`. Do: `export LANG=en_US.UTF-8`, then run `locale` to verify – Alastair McCormack Feb 21 '16 at 09:35
@AlastairMcCormack: in this case, the OP is almost certainly using a pipe, in which case there is no terminal and Python defaults to ASCII. – Martijn Pieters Feb 21 '16 at 09:37
@AlastairMcCormack thanks，but if I just execute the py file seperately, it works well, but if I use the file work with my friend's project, I get errors. There some clue: My friend some works on linux and some works on windows – Helpme123 Feb 21 '16 at 09:42
As @MartijnPieters has asked, you need to provide more details and not just the small sample provided. Are you piping within the shell. I.e. `cat "data" | python my_script.py`? – Alastair McCormack Feb 21 '16 at 09:46
sorry about ignorant of mine, I don't what's the "data" stand for. And I find when I print "你好" on the linux system, it works fine, but print u"你好" get error. – Helpme123 Feb 21 '16 at 12:34
And the key is not print correctly, but the error @MartijnPieters@Alastair McCormack – Helpme123 Feb 21 '16 at 12:38
@Helpme123 that is because one is *encoded bytes*, the other Unicode. Take a look at http://nedbatchelder.com/text/unipain.html to learn more about the difference. – Martijn Pieters Feb 21 '16 at 12:54
This is the correct answer, been struggling with that issue, the problem was locale settings – matshidis May 31 '21 at 16:43

score -1 · Answer 2 · edited Feb 19 '16 at 15:34

-1

you can try:

print u"{0}".format(str)

or

print u"{0}".format(l.decode('utf-8'))

edited Feb 19 '16 at 15:34

Waylan

31,078
10
69
99

answered Feb 19 '16 at 10:17

Felix Martinez

309
2
3

1

This is not an issue with how the interpolated value is converted, because it isn't. `bytestr % (unicodestring,)` will implicitly *decode* the bytestring, which works for the OP because `"%s"` only contains ASCII characters. The OP has an *encoding* exception however, because the `print` statement has to encode the resulting unicode value to `sys.stdout.encoding`. – Martijn Pieters Feb 19 '16 at 10:19

UnicodeEncodeError on Linux but not on Windows

2 Answers2

Linked