0

I have a sentence:

'hi how <unk> are you'

I need to remove <unk> from it.

Here is my code:

re.sub(r'\b{}\b'.format('<unk>'), '', 'agent transcript str <unk> with chunks for key phrases')

Why doesn't my RegEx work for <...>?

illuminato
  • 694
  • 1
  • 5
  • 21

1 Answers1

0

There is no word boundary between a space an < or >, you could instead try

re.sub(r'(\s*)<unk>(\s*)', r'\1\2', your_string)

Or - if you don't want two spaces, you may try

re.sub(r'(\s*)<unk>\s+', r'\1', your_string)


Remember that \b is a word boundary between a non-word character ([^\w+]+) and a word character (\w+ or [A-Za-z0-9_]). In your original string, you were trying to find a boundary between a space and a < or > where \b is not matching.
See a demo on regex101.com.
Jan
  • 40,932
  • 8
  • 45
  • 77