0

I have a data frame with rownames that contains space separated strings. I would like to grep the last 5 part of the rowname and save it in a new column.

hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt

To get part one I do this:

read.table(text=rownames(df))$V1

What I want:

TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt
user2300940
  • 2,071
  • 15
  • 28
  • 1
    Relevant posts: [Splitting a dataframe string column into multiple different columns](http://stackoverflow.com/questions/18641951), [Split a column of a data frame to multiple columns](http://stackoverflow.com/questions/4350440). – zx8754 Jun 22 '16 at 07:32

3 Answers3

3

We can either split the string with strsplit, get the last 5 elements with tail and paste it together

 paste(tail(strsplit(str1, "\\s+")[[1]],5), collapse=" ")
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

If we have multiple elements, we loop through the list (output from strsplit) and do the same as above.

 sapply(strsplit(rep(str1,2), " "), function(x) paste(tail(x, 5), collapse=" "))
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt" "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

Or use str_extract

 library(stringr)
 str_extract(str1, "(\\S+\\s+){4}\\S+$")
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

Part of the same pattern can be used in sub from base R

sub(".*\\s+((\\S+\\s+){4})(\\S+)$", "\\1\\3", str1)
#[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

data

str1 <- "hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
akrun
  • 789,025
  • 32
  • 460
  • 575
3

We can use word from stringr,

library(stringr)
paste(word(x, -5:-1), collapse = ' ')
#[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
Sotos
  • 47,396
  • 5
  • 31
  • 61
2

You can use this

library(stringr)
library(stringi)
word(V1,stri_count(V1,regex="\\S+")-4,stri_count(V1,regex="\\S+"))

Data

V1<-"hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
user2100721
  • 3,517
  • 2
  • 19
  • 29