2

I have a data frame like this:

TargetID        Gene
cg26365299        HOXA9
cg26476852        HOXA9
cg26492446      BHLHE23
cg26521404        HOXA9
cg26531174         CDX1
cg26595643         VAX1

And I want it into this shape

Gene         TargetID
HOXA9        cg26365299;cg26476852;cg26521404
BHLHE23      cg26492446
CDX1         cg26531174
VAX1         cg26595643

I tried with dcast but it doesn't work

user976991
  • 401
  • 1
  • 6
  • 17

3 Answers3

4

Use aggregate. Consider df is your data.frame:

> aggregate(TargetID~Gene, data=df, paste0, collapse=";")
     Gene                         TargetID
1 BHLHE23                       cg26492446
2    CDX1                       cg26531174
3   HOXA9 cg26365299;cg26476852;cg26521404
4    VAX1                       cg26595643
Jilber Urbina
  • 53,125
  • 10
  • 108
  • 134
1

Another possibility.

ll <- lapply(unstack(df), paste0, collapse = ";")
data.frame(TargetID = names(ll), Gene = unlist(ll), row.names = NULL)

#   TargetID                             Gene
# 1  BHLHE23                       cg26492446
# 2     CDX1                       cg26531174
# 3    HOXA9 cg26365299;cg26476852;cg26521404
# 4     VAX1                       cg26595643
Henrik
  • 61,039
  • 13
  • 131
  • 152
0

Another option using plyr:

ddply(df,.(Gene),summarise,TargetID=paste(TargetID,collapse=";"))
  Gene                         TargetID
1 BHLHE23                       cg26492446
2    CDX1                       cg26531174
3   HOXA9 cg26365299;cg26476852;cg26521404
4    VAX1                       cg26595643
agstudy
  • 116,828
  • 17
  • 186
  • 250