14

I would like to generate indices to group observations based on two columns. But I want groups to be made of observation that share, at least one observation in commons. I can see how to make groups based on observations that share both observation in common, but not just one of them.

For example, with the data frame :

dt <- data.frame(id=1:10,
             G1 = c("A","A","B","B","C","C","C","D","E","F"),
             G2 = c("Z","X","X","Y","W","V","U","s","T","T"))

I would like to get a column

1,1,1,1,2,2,2,3,4,4

I tried with group_indices from dplyr, but haven't managed it.

zx8754
  • 46,390
  • 10
  • 104
  • 180
Malta
  • 1,593
  • 3
  • 15
  • 27

1 Answers1

17

Using igraph get membership, then map on names:

library(igraph)

# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership

myGroups 
# A B C D E F Z X Y W V U s T 
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4 

# then map on names
df1$group <- myGroups[df1$G1]


df1
#    id G1 G2 group
# 1   1  A  Z     1
# 2   2  A  X     1
# 3   3  B  X     1
# 4   4  B  Y     1
# 5   5  C  W     2
# 6   6  C  V     2
# 7   7  C  U     2
# 8   8  D  s     3
# 9   9  E  T     4
# 10 10  F  T     4
zx8754
  • 46,390
  • 10
  • 104
  • 180
  • Thank you, this answer seems good to me, I would rather have a dplyr answer but it's fine anyway. I don't understand what you mean by "the input will change" : won't your method work with other data ? – Malta Jul 13 '17 at 11:58
  • @Malta as we don't have the real data, I cannot tell, if your data is similar to your example, then all should work as expected. – zx8754 Jul 13 '17 at 11:59
  • 1
    My data is quite similar, so this is great, thank you ! – Malta Jul 13 '17 at 12:03