1

I need to construct regular expression dynamically in order to perform an exact match in R using grep(). This is the code I have:

names = c('John Doe', 'John-Doe', 'Doe John', 'Doe-John', 'John', 'Doe')

for(name in names) {
    pattern = paste('(?<![A-z]-)^\\b', name, '\\b$(?!-[A-z])', sep = '')
    index = grep(pattern, names)

    print(index)
}

Desired output:

  • each name must be match exactly to an element within the names vector
  • e.g., John should return only index 5 and nothing else

I tested my regular expression here https://regex101.com/r/uJhJwS/2 and it appears to work fine. However, I get the following error in R:

Error in grep(pattern, names) : 
  invalid regular expression '(?<![A-z]-)John Do$(?!-[A-z])', reason 'Invalid regexp'

What is going wrong?

Toto
  • 86,179
  • 61
  • 85
  • 118
Mihai
  • 2,523
  • 3
  • 23
  • 44
  • 1
    Your regex does not work as you think it does, it is equal to [`^John$`](https://regex101.com/r/uJhJwS/3), and that is what you are looking for: an exact match. – Wiktor Stribiżew Nov 05 '17 at 21:14
  • 1
    You can do this using `sapply` to examine each element of `names` and `which` to look for exact matches, without regex: `sapply(names, function(x) which(names == x))` – neilfws Nov 05 '17 at 21:18
  • @WiktorStribiżew I was using `\bJohn\b` prior to this and it didn't work. That's the only reason why I tried the `lookarounds`. But the `^$` helped me solve it. However, why did you say that the `regex` is equal to ^John$? Thanks. – Mihai Nov 05 '17 at 21:51
  • 1
    @F.Gran The pattern in the regex101 fiddle is `(? – Wiktor Stribiżew Nov 05 '17 at 22:13

0 Answers0