How to specify string variables as unicode strings for pattern and text in regex matching?

Question

>>> import re
>>> re.match(u'^[一二三四五六七]、', u'一、')

If the pattern and the text are stored in variables (for example, they were read from text files),

>>> myregex='^[一二三四五六七]、'
>>> mytext='一、'

How shall I specify myregex and mytext to re.match, in the same way as re.match(u'^[一二三四五六七]、', u'一、')? Thanks.

Your working example uses Unicode strings while your non-working example uses byte strings and that's wrong in your case. — dlask, Jun 16 '15 at 03:41
Did you just create a duplicate of [your own question](http://stackoverflow.com/questions/30857742/unicode-regex-to-match-a-character-class-of-chinese-characters)? — Raniz, Jun 16 '15 at 03:44

styvane · Accepted Answer · 2015-06-16T03:38:22.100

1

simply use

re.match(myregex.decode('utf-8'), mytext.decode('utf-8'))

edited Jun 16 '15 at 03:38

answered Jun 16 '15 at 03:24

styvane

Thanks, but that doesn't match anything. while `re.match(u'^[一二三四五六七]、', u'一、')` does. – Tim Jun 16 '15 at 03:28
When is `re.U` needed then? – Tim Jun 16 '15 at 03:49

1 Answers1