38

So I have a python script that I'd prefer worked on python 3.2 and 2.7 just for convenience.

Is there a way to have unicode literals that work in both? E.g.

#coding: utf-8
whatever = 'שלום'

The above code would require a unicode string in python 2.x (u'') and in python 3.x that little u causes a syntax error.

wjandrea
  • 23,210
  • 7
  • 49
  • 68
ubershmekel
  • 10,560
  • 7
  • 65
  • 81

2 Answers2

27

Edit - Since Python 3.3, the u'' literal works again, so the u() function isn't needed.

The best option is to make a method that creates unicode objects from string objects in Python 2, but leaves the string objects alone in Python 3 (as they are already unicode).

import sys
if sys.version < '3':
    import codecs
    def u(x):
        return codecs.unicode_escape_decode(x)[0]
else:
    def u(x):
        return x

You would then use it like so:

>>> print(u('\u00dcnic\u00f6de'))
Ünicöde
>>> print(u('\xdcnic\N{Latin Small Letter O with diaeresis}de'))
Ünicöde
ubershmekel
  • 10,560
  • 7
  • 65
  • 81
Lennart Regebro
  • 158,668
  • 41
  • 218
  • 248
0

In 3.0, 3.1, and 3.2:

from __future__ import unicode_literals

Source: ubershmekel, in the question. See revision 4 for the original.

wjandrea
  • 23,210
  • 7
  • 49
  • 68