0

I have a large dataset and I would like to remove characters, starting with e, v, i at the end of a string. My dataset looks like this

P*01:01:05e1
P*01:01:05e2
P*01:01:05e3
P*01:01:05e10
P*02:02v1
P*02:02v2
P*02:01:03v2
P*05:01:01i1
P*05:01:01i8

and I want it to be P*01:01:05, P*02:02, P*02:01:03, P*05:01:01. I first tried removing the 'e' letters using

> xdata$gene <-gsub("e*", "", xdata$gene, perl = TRUE) 

but I get this error message

Error in `$<-.data.frame`(`*tmp*`, "gene", value = character(0)) : 
  replacement has 0 rows, data has 58

It appears I cannot replace 'e' with nothing. Any suggestions?

Data

xdata <- read.table(header = TRUE, stringsAsFactors = FALSE,
                    text = "gene
                    P*01:01:05e1
                    P*01:01:05e2
                    P*01:01:05e3
                    P*01:01:05e10
                    P*02:02v1
                    P*02:02v2
                    P*02:01:03v2
                    P*05:01:01i1
                    P*05:01:01i8")
rawr
  • 19,873
  • 4
  • 42
  • 74
Mona
  • 93
  • 1
  • 10

0 Answers0