Suppose you have a training set and a testing set. You learn a clustering model (e.g., k-means) from the training set, and then you take the observations from your testing set and put them in the appropriate cluster.
If the clustering you have learned is meaningful for the population that generated the training and testing sets, you would expect the proportion of observations allocated to each cluster to be fairly similar between the sets. For example, suppose that, for the training set, cluster 1 has 20% of the data, cluster 2 has 30% of the data, and cluster 3 has 50% of the data. If, for the testing set, cluster 1 had 80% of the data and cluster 2 and 3 each had 10%, that would suggest that the clustering was not meaningful.
Is there a statistical test or measure one can use to evaluate whether a clustering is meaningful in the sense described above?
I found a similar question that addresses the issue of ascertaining the quality of a clustering using a test set. But it does not consider the issue of cluster proportions.
If the learned cluster structure for the population is meaningful, we would expect p_i to be close to phat_i, for any cluster i. Is there a statistical test or measure that measures how similar the set of cluster proportions between a training set and a test set are?
– ostrichgroomer Jan 16 '17 at 23:51