0

How can I capture a word wrapped in square brackets using word bounderies like this: \b<WORD-IN-ANGLE-BRACKETS>\b. There appears to be some kind of bug that if a square bracket is touching a word boundry it doesn't match.

Look at this example:

re.findall(r"\b(<\w+>)\b", "<A> B <C>")  # 1. output: []
re.findall(r"(<\w+>)", "<D> E <F>")      # 2. output: ['<D>', '<F>']
re.findall(r"\b(\w+)\b", "G H I")        # 3. output: ['G', 'H', 'I']
re.findall(r"\b(z<\w+>z)\b", "z<D>z z<E>z z<F>z") #4. output: ['z<D>z', 'z<E>z', 'z<F>z']

As you can see in #4, if I put something between the square bracket and the word boundry it works, so this only happens when they are touching.

What is going on here? Why doesn't #1 work but #4 works? How can I make #1 work?

anthonybell
  • 5,400
  • 7
  • 40
  • 57
  • #1 makes perfect sense. – Wiktor Stribiżew Jun 22 '16 at 22:15
  • It is up to you, just do not remove your questions often. I'd keep it. – Wiktor Stribiżew Jun 22 '16 at 22:20
  • I am noticing the solution to that question does not solve my problem since `re.findall(r"(?:\b|\s+)()(?:\b|\s+)", " ")` prints `['']`. If I try to use look-ahead/look-behind I get an error: "look-behind requires fixed-width pattern" – anthonybell Jun 22 '16 at 22:23
  • Right now, I do not see that problem clearly stated in the question. Could you please narrow it down to this specific issue? Perhaps, you mean the *How can I match a work with angle brackets with word boundries enforced?* question at the end... Anyway, I believe you need to replace `\b...\b` with `(? – Wiktor Stribiżew Jun 22 '16 at 22:24
  • updated to include a concise question: why does this work `\b(zz)\b` but not this: `\b()\b`. – anthonybell Jun 22 '16 at 22:30
  • Actually, it is still a duplicate because you just ask for an explanation of why `\b B "` (answer: because none of the `(?!\w)'` or define what word boundary you need (define it for this task). – Wiktor Stribiżew Jun 22 '16 at 22:35

0 Answers0