Removing characters at the end of a string using R

Asked Nov 18 '16 at 21:01

Active Nov 18 '16 at 22:11

Viewed 50 times

I have a large dataset and I would like to remove characters, starting with e, v, i at the end of a string. My dataset looks like this

P*01:01:05e1
P*01:01:05e2
P*01:01:05e3
P*01:01:05e10
P*02:02v1
P*02:02v2
P*02:01:03v2
P*05:01:01i1
P*05:01:01i8

and I want it to be P*01:01:05, P*02:02, P*02:01:03, P*05:01:01. I first tried removing the 'e' letters using

> xdata$gene <-gsub("e*", "", xdata$gene, perl = TRUE)

but I get this error message

Error in `$<-.data.frame`(`*tmp*`, "gene", value = character(0)) : 
  replacement has 0 rows, data has 58

It appears I cannot replace 'e' with nothing. Any suggestions?

Data

xdata <- read.table(header = TRUE, stringsAsFactors = FALSE,
                    text = "gene
                    P*01:01:05e1
                    P*01:01:05e2
                    P*01:01:05e3
                    P*01:01:05e10
                    P*02:02v1
                    P*02:02v2
                    P*02:01:03v2
                    P*05:01:01i1
                    P*05:01:01i8")

edited Nov 18 '16 at 22:11

rawr

19,873
4
42
74

asked Nov 18 '16 at 21:01

Mona

Try `stringr::str_split_fixed(df1$V1, pattern = "e|v|i", n = 2)` – zx8754 Nov 18 '16 at 21:05
What about: `strings – William Nov 18 '16 at 21:08
@zx8754 OP wants to remove not split – Sotos Nov 18 '16 at 21:11
1

@Sotos split then get 1st column? I will leave to community if this needs re-opening. `stringr::str_split_fixed(df1$V1,pattern = "e|v|i", n = 2)[, 1]` – zx8754 Nov 18 '16 at 21:14
Yeah I guess thats one way of doing it. So many dupes for these kind of questions – Sotos Nov 18 '16 at 21:16
2

@Sotos Exactly my point, many many dupes, agreed target is not 100% dupe, but gives enough knowledge to go towards the right solution. – zx8754 Nov 18 '16 at 21:17
no one really addressed the error... @Mona I feel like you are misspelling the column name in your `gsub`, for example I get that error if I use `xdata$gene – rawr Nov 18 '16 at 22:10
I spotted the error and edited the formula and it worked. FYI the formula is: > data$B_newY – Mona Nov 18 '16 at 22:45
Also I have 10 columns of data but only want to apply the formula to 9 columns, any suggestions. – Mona Nov 18 '16 at 22:52
2

`sub_fun – rawr Nov 18 '16 at 23:32

Removing characters at the end of a string using R

0 Answers0