3

I have a vector mynumbers with several strings of numbers, say:

mynumbers <- c("122212", "134134", "134134", "142123", "212141", "213243", "213422", "214231", "221233")

My goal is to translate such strings into strings of letters following these relationships:

1=A
2=C
3=G
4=T

I'd like to encapsulate this in a function so that:

myletters <- translate_function(mynumbers)

myletters would thus be:

myletters <- c("ACCCAC", "AGTAGT", "AGTAGT", "ATCACG", "CACATA", "CAGCTG", "CAGTCC", "CATCGA", "CCACGG")

I'm thinking of a function like this, obviously not correct... I start to get confused when dealing with strsplit and lists...

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  #strsplit numbers
  split_numbers <- strsplit(numbers, '')
  letters <- paste(sapply(split_numbers, function(x) map_df$nuc[which(map_df$num==x)]), collapse='')
  
  return(letters)
}

What would be the easiest and most elegant way to accomplish this? Thanks!

DaniCee
  • 2,073
  • 6
  • 29
  • 52

4 Answers4

4

Easily by chartr,

chartr("1234" , "ACGT", mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"
Park
  • 10,984
  • 6
  • 8
  • 27
4

You may use stringr::str_replace_all create a named vector from map_df to replace.

map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
stringr::str_replace_all(mynumbers, setNames(map_df$nuc, map_df$num))

#[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
3

Use it in a function this way:

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  letters <- chartr(paste(map_df$num, collapse=''), paste(map_df$nuc, collapse=''), numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"

But it's better without a dataframe:

translate_function <- function(numbers){
  letters <- chartr("1234", "ACGT", numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"
U12-Forward
  • 65,118
  • 12
  • 70
  • 89
1

Using gsubfn

library(gsubfn)
gsubfn("(\\d)", setNames(as.list(c("A", "C", "G", "T")), 1:4), mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"
akrun
  • 789,025
  • 32
  • 460
  • 575