1

I have the following data set D7

name sex_age eye_color height
1    J    M.34     Other     61
2    A    F.55      Blue     59
3    T    M.76     Brown     51
4    D    F.19     Other     57

I want to separate the column sex_age into sex column and age column, so I type

separate(D7,sex_age,c('sex','age'),sep='.')

But it generates

name sex age eye_color height
1    J             Other     61
2    A              Blue     59
3    T             Brown     51
4    D             Other     57
Warning message:
Too many values at 4 locations: 1, 2, 3, 4 

Also, when I modify my original data set D7 into D8

name sex_age eye_color height
1    J    M_34     Other     61
2    A    F_55      Blue     59
3    T    M_76     Brown     51
4    D    F_19     Other     57

And I type D7 %>% separate(sex_age,c('sex','age'),sep="_") it gives

name  sex  age eye_color height
1    J M.34 <NA>     Other     61
2    A F.55 <NA>      Blue     59
3    T M.76 <NA>     Brown     51
4    D F.19 <NA>     Other     57
Warning message:
Too few values at 4 locations: 1, 2, 3, 4 

Did I misuse the separate function? I am very puzzled. Thank you for any suggestions.

ftxx
  • 908
  • 1
  • 11
  • 30

1 Answers1

4

since sep= argument considers regex and . is a special character, therefore we need to have \\ before such special characters such that they are read as normal characters

separate(df, sex_age, into = c("sex", "age"), sep = "\\.")
joel.wilson
  • 7,843
  • 5
  • 26
  • 44
  • 1
    Thanks! I am new to regular expressions, learning it now. – ftxx Jan 24 '17 at 21:02
  • But what about the warning generated by `_`? I wasn't aware that `_` had a special meaning in regex. Even adding `\\\` doesn't get rid of the warning. – user2739472 Jun 08 '18 at 13:39