303

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?

For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.

Do you know some better way ?

martineau
  • 112,593
  • 23
  • 157
  • 280
MichaelT
  • 7,044
  • 7
  • 33
  • 46

4 Answers4

404

Use the re.escape() function for this:

4.2.3 re Module Contents

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)
200_success
  • 7,006
  • 1
  • 42
  • 70
ddaa
  • 50,333
  • 7
  • 49
  • 57
70

You can use re.escape():

re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'

If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.

If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).

gimel
  • 78,638
  • 10
  • 72
  • 104
8

Unfortunately, re.escape() is not suited for the replacement string:

>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'

A solution is to put the replacement in a lambda:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

because the return value of the lambda is treated by re.sub() as a literal string.

Owen
  • 37,828
  • 14
  • 92
  • 120
  • 4
    The `repl` argument to `re.sub` is a string, not a regex; applying `re.escape` to it doesn't make any sense in the first place. – tripleee Jan 29 '18 at 06:54
  • 9
    @tripleee That's incorrect, the `repl` argument is not a simple string, it is parsed. For instance, `re.sub(r'(.)', r'\1', 'X')` will return `X`, not `\1`. – Flimm Apr 20 '18 at 13:45
  • 9
    Here's the relevant question for escaping the `repl` argument: https://stackoverflow.com/q/49943270/247696 – Flimm Apr 20 '18 at 13:54
  • 7
    Changed in version 3.3: The '_' character is no longer escaped. Changed in version 3.7: [Only characters that can have special meaning in a regular expression are escaped.](https://docs.python.org/3/library/re.html#re.escape) (Why did it take so long?) – Cees Timmerman Aug 11 '18 at 21:58
-5

Please give a try:

\Q and \E as anchors

Put an Or condition to match either a full word or regex.

Ref Link : How to match a whole word that includes special characters in regex

guru
  • 9
  • 5