1

I have watched a video describing the differences between Cluster Analysis and Mixture Models. https://www.youtube.com/watch?v=HwsMZwhO7wU&t=2s

Clustering determines compact clusters and assigns people to those clusters. Clusters cannot be overlapping and are entirely discrete. Mixture Models such as LCA are probabilistic and estimate a fit of the model to the data where classes are probabilistic and there can be overlapping memberships. This is my gist understanding of the video.

However, what are the use cases for when you may prefer one method over another? Why might you prefer Cluster Analysis such as K-Means over Mixture Model such as LCA?

JElder
  • 919
  • 2
    There are some fuzzy clustering methods. See e.g. https://en.wikipedia.org/wiki/Fuzzy_clustering Wikipedia – Peter Flom Nov 18 '23 at 16:42

1 Answers1

3

Broadly speaking, latent class analysis (LCA) is a type of finite mixture model (FMM) that derives clusters using probability models, whereas cluster analysis (CA) derives clusters using some sort of distance measure$^1$. LCA is a type of FMM that uses probability models for categorical data (e.g., dichotomous and ordinal data). If your data is categorical, I would lean more towards LCA as CA methods, such as K-means, treat the data as continuous. That being said, depending on the purpose of your analysis, K-means may also be an acceptable choice, as recent research (e.g., Brusco, Shireman, & Steinley, 2017) has shown that K-means clustering can produce acceptable results even when the data is dichotomous.

Finally, see this post for more info. They go into further details regarding the pros and cons of FMM vs. CA. One thing mentioned there of particular note is that FMMs are much more flexible than CA, as they allow for many extensions, such as the inclusion of covariates and hierarchical/nested data structures.

$^1$ For more information on CA distance measures and their differences see Steinley & Brusco (2008).

References

Brusco, M. J., Shireman, E., & Steinley, D. (2017). A comparison of latent class, K-means, and K-median methods for clustering dichotomous data. Psychological methods, 22(3), 563.

Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125-144.