I created dummy variables (binary data) from categorical variables where I want to partition N subjects into multiple classes by some clustering method. I created a Jaccard similarity index matrix for all subjects, thus having N by N similarity matrix.
My question is, if it is OK to apply a hierarchical clustering using eucledian distance measure on the Jaccard similarity index matrix.
The result looks very good and valid. In fact much better than when I use the jaccard dissimilarity (1-Jaccard index) matrix. I want to make sure that I am not creating mathematical nonsense.
eucledian distance measure on the Jaccard similarity index matrixThis is misty. Jaccard similarity is a proximity measure. Euclidean distance is another proximity measure. Maybe you meantor, notonin that sentence? – ttnphns Sep 05 '17 at 15:33onthe jaccard index. I will try the Dice algorithm and check the performance. My reasoning would be that I create a continuous data set from nominal data (jaccard,dice) which then can be used with e.g. euclidean distance to perform a hierarchical clustering. – dmeu Sep 06 '17 at 07:43using eucledian distance measure on the Jaccard similarity index matrixis not clear. It sounds as if you are going to see the jaccard matrix as some dataset and compute euclidean distances between its rows?? – ttnphns Sep 06 '17 at 08:41