-3

If you have a string in r, that has something like...

messystuffSample0001moremessystuff and you would like to get Sample0001

What would be a good way of doing that, especially if messystuff, and moremessystuff vary in content and size. The part of interest is "Sample" plus the 4 digits.

Mark
  • 39
  • 6

2 Answers2

3

You can use str_extract() from the stringr package.

library(stringr)

mess <- "messystuffSample0001moremessystuff"

str_extract(mess, "Sample\\d{4}")
# [1] "Sample0001"

This extracts regular expression matches using the regular expression Sample\\d{4}, which matches "Sample" followed by 4 digits.

As Frank pointed out, this can also be accomplished using base R:

regmatches(mess, regexpr("Sample\\d{4}", mess))
Alex A.
  • 5,346
  • 4
  • 25
  • 56
  • @Frank: Neat, I didn't know about `regmatches()`. If you post that as an answer you'll get +1 from me. Otherwise I'll include it here with your permission. – Alex A. May 20 '15 at 19:26
  • 1
    @Frank: There's almost always a base R way, it just may not be elegant. :) But in this case, `regmatches()` makes the base solution more elegant than I figured it would be. – Alex A. May 20 '15 at 19:35
  • Even though I got penalized for asking this question, I'm glad I did. str_extract is what I needed, and what I use now! – Mark May 21 '15 at 15:13
  • @Mark: Great, I'm glad it works for you. – Alex A. May 21 '15 at 15:20
1

You can use sub:

sub(".*(Sample\\d{4}).*", "\\1", "messystuffSample0001moremessystuff")
# [1] "Sample0001"
blakeoft
  • 2,312
  • 1
  • 13
  • 15