0

I'm having trouble understanding the syntax of meta-characters used in R regex functions. For example, I have the following character vector;

x<-c("See 053 2001",
     "Related to 0324 2015")

If I use the grep function to match the string of numbers separated by a space character and my basic understanding of regular expressions, I'd write the following (i.e., should work):

grep("F[0-9]{4}\s[0-9]", x, value = TRUE)
#returning 'Error: '\s' is an unrecognized escape in character string 
#starting ""F[0-9]{4}\s"'

Okay, that's confusing to me, given that the Regular Expressions as used in R Documentation states

Symbols \d, \s, \D and \S denote the digit and space classes and their negations...

However, from playing around and reading other [regex] questions on stack overflow, I note the following does find the match I'm looking for.

grep("F[0-9]{4}\\s[0-9]", x)#Or
grep("F[0-9]{4} [0-9]", x)
[1] 2

I'd guess this has to do with the way a character string is processed in R, with it's own escape characters. Can someone elaborate here?

k-dubs
  • 29
  • 7
  • From the linked answer `"R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character."` – Pierre L Jun 06 '16 at 23:49
  • Thanks! I didn't catch this in my search. – k-dubs Jun 07 '16 at 05:03

0 Answers0