I have 17 numeric and 5 binary (0-1) variables, with 73 samples in my dataset. I need to run a cluster analysis. I know that the Gower distance is a good metric for datasets with mixed variables. However, I couldn't understand how the Gower distance calculates the difference between binary variables. It seems to me that it is not different from Euclidean distance.
Asked
Active
Viewed 2.5k times
14
2 Answers
5
How about binary attributes that have the values "m" and "f", for "male" and "female"?
You do realize that for a dicotomous variable all you can get out is "same" or "different"? The key point difference between distances is not if the value is 1 or 0; but how multiple variables are combined.
Has QUIT--Anony-Mousse
- 42,358
5
Gower distance uses Manhattan for calculating distance between continuous datapoints and Dice for calculating distance between categorical datapoints
Sanjeet
- 51
Gower? http://stats.stackexchange.com/a/15313/3277 – ttnphns Oct 21 '14 at 16:48