I wonder how I can modify the K-means algorithm so that the cluster volumes are not equal to each other. The K-means objective is to minimize within cluster sum of squares $\sum_{i=1}^{p} {\parallel \mathit{X}_i-\mathit{L}_{\mathit{Z}_i} \parallel}_2^2$, and this objective assumes that all cluster variances are the same. If we assume that the clusters are Gaussian with mean $\mathit{L}_{\mathit{Z}_i}$ and variance $\sigma_{\mathit{Z}_i}^2$ where $\mathit{Z}_i$ stands for the cluster assignment of data point $i$, then the objective for the cluster assignments become $\sum_{i=1}^{p} \frac {{\parallel \mathit{X}_i-\mathit{L}_{\mathit{Z}_i} \parallel}_2^2} {\sigma_{\mathit{Z}_i}^2}$. So, I tried modifying K-means such that $\mathit{Z}_i$ update is performed using this new update rule, and $\sigma_{\mathit{Z}_i}^2$ are also updated in each iteration. However, when I use this new modified K-means, almost all data points are assigned to the same cluster, which is weird. What might be the problem about that approach? I know EM can be used for this unequal-volume GMM purpose, but I want a simpler approach like K-means, and I am really curious about why what I tried is not feasible. Thanks!
-
Interesting question. I don't see the need for a variance assumption in k-means. In fact, I thought the objective was to minimize $ \sum_{i=1}^K\sum_{x_j\in S_i} ||x_j - \mu_i ||^2$. If so, there is no equal variance assumption. – Drew75 Mar 06 '14 at 07:07
-
1In fact, since K-means does not assume that the data points are generated from a Gaussian mixture model, it doesn't deal with the variances explicitly, but it actually includes equal-variance assumption implicitly. If all the denominators in the new objective function that I wrote in the question are equal, then you can just omit this denominator, and you will be left with the K-means objective. – user5054 Mar 06 '14 at 08:11
-
Is your algorithm assignes a case to that cluster to which centroid it is closer by the distance, inversly weighted by the variance of the cluster? – ttnphns Mar 07 '14 at 07:33
-
Yes, that's what the algorithm I suggest would do. Is there a valid algorithm like that? I mean, if we also learn the cluster variances in K-means and use them while assigning the data points to clusters, does it happen to be a valid algorithm? Mathematically, when you learn univariate GGM parameters by MLE and hard assignment, what I suggested looks correct, but intuitively, this would result in a situation where one of the clusters has a high variance, so, the distance of the points to that cluster is small, and as a result, more data points are assigned to it, so on (a vicious circle). – user5054 Mar 14 '14 at 21:16
-
So, I just wonder if the algorithm I suggested is a pathologic one in general, or it behaves in this way just on the dataset I am working on. ? – user5054 Mar 14 '14 at 21:17
-
This belated comment may no longer matter but, first off, I was confused by the distinction in word usage between volumes and variances. To me, volume refers to cluster frequencies. That said, the question left me wondering how you were drawing the samples between each iteration update. If you weren't bootstrapping or jacknifing them and were using the same exact data from one update to the next, then I'm not at all surprised that your answer didn't change. My experience is that k-means algorithms are quite immune to minor tweaks to the alg. Evaluating k-means using k-fold cv is needed – user78229 Jun 09 '15 at 12:08
3 Answers
Use Gaussian Mixture Modeling (aka: EM Clustering) instead.
It allows different variances, depending on your model. It can even allow different covariances if you use the most complex models.
- 42,358
While not exactly K-means (see below), I ran across several clustering approaches (in addition to the EM clustering, already mentioned by @Anony-Mousse), which might be helpful. They include:
- weighted hierarchical clustering (search for WPGMA and WPGMC here);
- Dirichlet distribution-based clustering (Dirichlet mixture models and Bayesian-based hierarchical Dirichlet process models);
- latent class (LC) clustering (see this page and this paper; LC clustering can be viewed as K-means generalization or, using the authors' terminology, "probabilistic extension").
- 8,666
In a GMM, in the limit that the variance of each component goes to 0, and the mixing proportion is uniform, doing expectation maximization on GMM reduces to K-means. Basically one can see K-means as a very special case of a GMM. If you want to modify K-means to allow possibility of different variance in each component, I would say that you will end up with an EM on a GMM.
- 2,123