0

I know there are similar questions out there but I literally have spent all day on google and cannot find the answer to my issue. I have a GMT file where I need to replace the ensembl IDs with gene symbols for running a gene set analysis, I have a dataframe that lists the ensembl IDs with their matching gene symbols. I can run this code for one column and it works:

GMTdf$V3 <- Gene_list$hgnc_symbol[match(GMTdf$V3, Gene_list$ensembl_gene_id)]

But what I CANNOT figure out how to do is loop it for the 495 columns of the GMT file I have. I tried so many things and nothing works. The only thing that looked promising was the following code but it replaces everything with NAs.

GMTdf[,3:495] = Gene_list$hgnc_symbol[GMTdf[,3:495], Gene_list$ensembl_gene_id)]

I have tried using dplyr mutate and advice given in StackOverflow on replacing ensembl IDs with gene symbols but I am too much of an amateur coder to figure it out. Please help.

Cath
  • 23,575
  • 4
  • 51
  • 82

1 Answers1

0

You can use lapply to apply a function for multiple columns.

cols <- 3:495
GMTdf[cols] <- lapply(GMTdf[cols] function(x) 
                      Gene_list$hgnc_symbol[match(x, Gene_list$ensembl_gene_id)])

In dplyr, you can do the same with across.

GMTdf <- GMTdf %>% mutate(across(cols, 
                   ~Gene_list$hgnc_symbol[match(., Gene_list$ensembl_gene_id)]))
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178