0

I'm trying to optimize for-loop in my R-code.

Summary: I've a data frame (say, df) with 19 million rows including genes, and 2 columns('Chromosome' including corresponding chromosome, and 'Position' including corresponding position for each of those 19 mil genes). Now I want to create a new column 'chr_pos' being alternative name of each gene as Chromosome_Position. Example: a gene A located at chromosome 1, and position 123456 => Alternative name of the gene A would be 1_123456.

Here my code to do this:

for (i in nrow(df)){df$chr_pos[i] = paste0(df$Chromosome[i],"_",df$Position[i])}

I tried optimising using vectorisation but it's still ineffective.

Can this be optimised further?

Phil
  • 5,491
  • 3
  • 26
  • 61
Huy Nguyen
  • 61
  • 5

0 Answers0