2

I've got a data frame with 34 columns and 12,964 rows, two of these columns are Gene.Name and Mutation_Frequency. For example:

Gene.Name Mutation_Frequency
CTLA4 0
TP53 4
CTLA4 2
CTLA4 2
TP53 4
TP53 6

I now want to create a column called "Highest_Mutation_Frequency" which tells me the highest mutation frequency for the Gene.Name, and puts it in a new column, like this:

Gene.Name Mutation_Frequency Highest_Mutation_Frequency
CTLA4 0 2
TP53 4 6
CTLA4 2 2
CTLA4 2 2
TP53 0 6
TP53 6 6

I realize I could probably use the max() command, but I'm not sure how to implement this. As always, any help is appreciated!

Edit: Although this is quite similar to another question: Select the row with the maximum value in each group this question also involves producing unique rows and placing them in another data frame.

Anoushiravan R
  • 18,699
  • 3
  • 13
  • 36
KLM117
  • 183
  • 6
  • 2
    Try looking at [this post](https://stackoverflow.com/questions/25314336/extract-the-maximum-value-within-each-group-in-a-dataframe) – Ben Aug 07 '21 at 16:55

3 Answers3

5

You could use:

library(dplyr)

data %>%
  group_by(Gene.Name) %>%
  mutate(Highest_Mutation_Frequency = max(Mutation_Frequency))
Martin Gal
  • 14,910
  • 4
  • 18
  • 37
5

An option with base R

aggregate(cbind(Highest_Mutation_Frequency = Mutation_Frequency) ~ Gene.Name, data, FUN = max)
akrun
  • 789,025
  • 32
  • 460
  • 575
3

Another base R option using unique + ave

unique(
  transform(
    df,
    Highest_Mutation_Frequency = ave(Mutation_Frequency,Gene.Name,FUN = max)
  )[c("Gene.Name","Highest_Mutation_Frequency")]
)
ThomasIsCoding
  • 80,151
  • 7
  • 17
  • 65