How do I keep parts of a string that match a regular expression in r?

Question

If you have a string in r, that has something like...

messystuffSample0001moremessystuff and you would like to get Sample0001

What would be a good way of doing that, especially if messystuff, and moremessystuff vary in content and size. The part of interest is "Sample" plus the 4 digits.

Alex A. · Accepted Answer · 2015-05-20T19:32:41.323

3

You can use str_extract() from the stringr package.

library(stringr)

mess <- "messystuffSample0001moremessystuff"

str_extract(mess, "Sample\\d{4}")
# [1] "Sample0001"

This extracts regular expression matches using the regular expression Sample\\d{4}, which matches "Sample" followed by 4 digits.

As Frank pointed out, this can also be accomplished using base R:

regmatches(mess, regexpr("Sample\\d{4}", mess))

edited May 20 '15 at 19:32

answered May 20 '15 at 19:15

Alex A.

5,346
4
25
56

@Frank: Neat, I didn't know about `regmatches()`. If you post that as an answer you'll get +1 from me. Otherwise I'll include it here with your permission. – Alex A. May 20 '15 at 19:26
1

@Frank: There's almost always a base R way, it just may not be elegant. :) But in this case, `regmatches()` makes the base solution more elegant than I figured it would be. – Alex A. May 20 '15 at 19:35
Even though I got penalized for asking this question, I'm glad I did. str_extract is what I needed, and what I use now! – Mark May 21 '15 at 15:13
@Mark: Great, I'm glad it works for you. – Alex A. May 21 '15 at 15:20

blakeoft · Answer 2 · 2015-05-20T19:19:36.113

1

You can use sub:

sub(".*(Sample\\d{4}).*", "\\1", "messystuffSample0001moremessystuff")
# [1] "Sample0001"

edited May 20 '15 at 19:19

answered May 20 '15 at 19:17

blakeoft

2,312
1
13
15

You can match multiple digits in a row using `\\d{4}` rather than `\\d` four times. – Alex A. May 20 '15 at 19:18
@AlexA. Thanks for the tip. I'm sure it'll be nice for me to use one day. – blakeoft May 20 '15 at 19:20

How do I keep parts of a string that match a regular expression in r?

2 Answers2