I have done some clustering to a matrix with 30 random variables , each variable has 13000 observations ). i got 10 clusters
and now i need to test how good the clustering is by calculating the variance in each cluster. does anyone knows how can i calculate the variance?
i can easily calculate the variance of each column in my matrix (e.g the variance of each random variable) but i want to calculate the variance of the whole cluster.
does anyone know how it can be done?
e.g.
data <- data.frame(x=c(2,2,2,3,7),
y=c(30,40,40,30,10),
z=c(1,2,3,4,5),
cluster=c('a','a','c','a','c'))
candidates <- dlply(data,.(cluster),function(data){
laply(data[,-4],var)
})
This gives variance per column for each cluster label (a,c). I don't think it's the right approach.