0

date = re.search(r'([\x\d\w-.\s,()&\"]+|)

I am migrating a code from PHP to Python, and am using this piece of regex expression on re.match, which doesn't work, giving a python error of:

raise error, v # invalid expression

It works on PHP's preg_match, and also http://www.gskinner.com/RegExr , any idea why this is happening? Thanks!

nubela
  • 15,816
  • 22
  • 71
  • 119

1 Answers1

3
\x

on its own is invalid (both in PHP and Python, but perhaps PHP just ignores it while Python throws an exception). Try removing it, and also moving the - to the end of the character class:

date = re.search(r'<td>([\d\w.\s,()&\"-]+|)<br><font',page_data)

But in all cases, you won't get very happy if you try parsing HTML with regular expressions.

Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544
  • 1
    RE: Parsing X?HTML with regexes: [DON'T DO IT](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Hank Gay May 26 '10 at 17:49