I'm very new to cluster analysis. I'm using R for k-means clustering and I wonder what those things are. And what is better if their ratio is smaller or larger?
Asked
Active
Viewed 4.1k times
1 Answers
16
It's basically a measure of the goodness of the classification k-means has found. SS obviously stands for Sum of Squares, so it's the usual decomposition of deviance in deviance "Between" and deviance "Within". Ideally you want a clustering that has the properties of internal cohesion and external separation, i.e. the BSS/TSS ratio should approach 1.
For example, in R:
data(iris)
km <- kmeans(iris[,1:4], 3)
gives a BSS/TSS ratio of 88.4% (0.884) indicating a good fit. You should be careful tough, and it's usually a good idea to plot the WSS against the number of cluster, since this number has to be specified beforehand.
lambda_vu
- 306