1

I have some data, where first column have some duplicating rows, thought another column is from all different data. I need to leave just one duplicating row in first column and merge rows with different ones from another column. For example

Z = c( "a", "a", "b", "c", "d", "d", "d")
X = c( 10, 10, 0, 3, 4, 4, 4)
Y = c("ab", "bc", "dv", "mh", "op", "va", "po")
c = data.frame(Z,X,Y)

c

  Z  X  Y
1 a 10 ab
2 a 10 bc
3 b  0 dv
4 c  3 mh
5 d  4 op
6 d  4 va
7 d  4 po

I need to merge

Z  X   Y
a 10  ab,bc
b  0  dv
c  3  mh
d  4  op, va, po

or even

Z  X   Y    L   V
a  10  ab  bc
b   0  dv
c   3  mh
d   4  op  va  po

Is it possible?

talat
  • 66,143
  • 20
  • 123
  • 153

3 Answers3

2

We can try with data.table

library(data.table)
setDT(c)[, .(X = unique(X), Y = paste(Y, collapse = ",")), by = Z]
#  Z  X        Y
#1: a 10    ab,bc
#2: b  0       dv
#3: c  3       mh
#4: d  4 op,va,po
mtoto
  • 23,013
  • 3
  • 54
  • 70
1

The plyr package is handy in these situations:

library(plyr)
ddply(c, c("Z", "X"), summarise, Y=  paste(Y, collapse = ","))
  Z  X        Y
1 a 10    ab,bc
2 b  0       dv
3 c  3       mh
4 d  4 op,va,po
csgillespie
  • 57,032
  • 13
  • 142
  • 178
1

In base R:

aggregate(Y ~ Z + X, data = c, toString)

which gives:

  Z  X          Y
1 b  0         dv
2 c  3         mh
3 d  4 op, va, po
4 a 10     ab, bc

Or with dplyr:

library(dplyr)
c %>% group_by(Z,X) %>% summarise(Y = toString(Y))

which gives the same result.

Jaap
  • 77,147
  • 31
  • 174
  • 185