0

I have this data frame

df<-data.frame(ID=c(1,1,2,2,2),A=c(1,2,1,2,3),B=c("A","T","T","A","G"))

  ID A B
1  1 1 A
2  1 2 T
3  2 1 T
4  2 2 A
5  2 3 G

and I need this summarize table

summary_df <- data.frame(ID = c(1,2), sort_factor_and_combin_B = c("A-T","A-T-G"))

  ID sort_factor_and_combin_B
1  1                      A-T
2  2                    A-T-G

Regardless of the order of column A, I want to create a column that contains the characters that are concatenated in alphabetical order with the factors in column B that each ID has.

2. At the same time, I also want a column that joins according to the order of A.

do you have any idea?

thank you!

h-y-jp
  • 149
  • 1
  • 7

2 Answers2

3

We can use tapply()

tmp1 <- tapply(df$B, df$ID, function(x){
  paste(sort(x), collapse = "-")
})

# cbind to desired format
cbind("ID" = unique(df$ID),
"sort_factor_and_combin_B" = tmp1)

#   ID  sort_factor_and_combin_B
# 1 "1" "A-T"                   
# 2 "2" "A-G-T"  
Base_R_Best_R
  • 1,749
  • 1
  • 9
  • 19
1

For each ID sort B values and paste them together.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(sort_factor_and_combin_B = paste0(sort(B), collapse = '-'))

#     ID sort_factor_and_combin_B
#* <dbl> <chr>                   
#1     1 A-T                     
#2     2 A-G-T                 

Base R aggregate :

aggregate(B~ID, df, function(x) paste0(sort(x), collapse = '-'))
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178