5

Essentially this tutorial (http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_assumptions.html#example-cluster-plot-kmeans-assumptions-py ) explains very clearly some limitations of the k-means clustering method which are rooted in its assumptions.

Specifically: 1) k-means assumes the variance of the distribution of each attribute (variable) is spherical; 2) all variables have the same variance; 3) the prior probability for all k clusters is the same, i.e., each cluster has roughly equal number of observations;

Now, thanks to the silhouette method, I can handle the case with the wrong number of blobs but I am in the dark for how to handle the other cases. Could you point me to some methods to solve the other issues and if, possible, how to blend them toghether since in my datasets these behaviours coexist?

ttnphns
  • 57,480
  • 49
  • 284
  • 501
Asher11
  • 229
  • 1
    Your question might get more response if you listed the relevant assumptions that are given in your link. That said, Gaussian Mixture Models are a common approach for the types of data shown there. – GeoMatt22 Aug 21 '16 at 23:06
  • Prior to running k-means, it is customary to normalize all variables such that all of them have 0 mean and unit variance. This validates assumption 2). – PolBM Aug 22 '16 at 07:08
  • k-means assumes the variance of the distribution of each attribute (variable) is spherical What's that? Spherical univariate distribution, how can that be? 2) all variables have the same variance [for the whole sample] K-means has no such assumption. 3) each cluster has roughly equal number of observations K-means has no such assumption (rather, it assumes clusters of approximately equal diameter).
  • – ttnphns Aug 22 '16 at 10:07
  • blobs but I am in the dark for how to handle the other cases. Could you point me to some methods to solve the other issues and if, possible, how to blend them toghether This is not clear what you want, in the end. The whole your question looks to me rather dim in meaning. – ttnphns Aug 22 '16 at 10:10