0

Right now my dataset is this and I am trying to compute a distance matrix in order to plot clusters. The Strings must be an exact match, I labeled the recipes A,B,C but they can be "Pizza", "Pasta", "Salad" etc and I need to create a cluster chart that displays the connection between the recipes but need the distance matrix first. Right now using this,

       library(proxy)
       mat = as.matrix(dist(data)) 

I obtain a 9x9 matrix, not a 3x3 as desired

How can I obtain a distance matrix just based on the recipes in common that connect the customers in order to plot and vice-versa?

hjpotter92
  • 75,209
  • 33
  • 136
  • 171
Buddy Holly
  • 115
  • 1
  • 2
  • Welcome to SO. You could improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit & improve it accordingly. A good post usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able. – lukeA Jun 27 '16 at 21:37

1 Answers1

0

Here's how you could create a distance matrix:

data <- read.table(sep=",", text="1,A
2,B
1,C
2,C
2,B
3,A
3,B
3,C
3,D")
data <- reshape2::dcast(
  data, 
  V1~V2, 
  fun.aggregate = length, 
  value.var="V2"
)
(mat <- as.matrix(dist(data, meth = "binary")) )
#     1   2   3
# 1 0.0 0.5 0.4
# 2 0.5 0.0 0.4
# 3 0.4 0.4 0.0 
lukeA
  • 50,755
  • 5
  • 83
  • 91