3

I have a regular expression '[\w_-]+' which allows alphanumberic character or underscore.

I have a set of words in a python list which I don't want to allow

listIgnore = ['summary', 'config']

What changes need to be made in the regex?

P.S: I am new to regex

Prasoon Saurav
  • 88,492
  • 46
  • 234
  • 343

2 Answers2

3
>>> line="This is a line containing a summary of config changes"
>>> listIgnore = ['summary', 'config']
>>> patterns = "|".join(listIgnore)
>>> print re.findall(r'\b(?!(?:' + patterns + r'))[\w_-]+', line)
['This', 'is', 'a', 'line', 'containing', 'a', 'of', 'changes']
devnull
  • 111,086
  • 29
  • 224
  • 214
2

This question intrigued me, so I set about for an answer:

'^(?!summary)(?!config)[\w_-]+$'

Now this only works if you want to match the regex against a complete string:

>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>

So to use your list, just add in more (?!<word here>) for each word after ^ in your regex. These are called lookaheads. Here's some good info.

If you're trying to match within a string (i.e. without the ^ and $) then I'm not sure it's possible. For instance the regex will just pick a subset of the string that doesn't match. Example: ummary for summary.

Obviously the more exclusions you pick the more inefficient it will get. There's probably better ways to do it.

korylprince
  • 2,922
  • 1
  • 16
  • 26
  • Probably, filtering all found values - like in thefourtheye's answer - will be more effective (re may be a memory-crunching bitch) – volcano Nov 07 '13 at 06:27