What is the relationship between SVD and factor analysis? How can use singular values and other matrices from SVD to perform factor analysis or cluster document-term matrix without using other clustering techniques?
3 Answers
Google brought me here, and I dislike how the comments just assume everyone knows that FA and PCA are related. So to answer your question: yes. See Tipping and Bishop, Probabilistic principal component analysis, 1996. This paper is great because:
- It discusses the connection between FA and PCA (Section 2.2)
- It discusses using the SVD to compute the ML parameters (Appendix A)
- 1,694
Google brought me here too, and I found that the implementation of Scikit-learn library, a famous repository for data science in Python, uses SVDs with a small tweak to fit the data points and perform factor analysis.
Hence the answer is a big YES you can use SVD.
If you're keen with code implementation, I suggest you can read the Factor Analysis source code of Scikit-learn here at github. They implement the SVD algorithm using Scipy library and tweak the output for shape adjustment.
In addition to that, I want to add some reference on top of Probabilistic Principal Component Analysis paper PPCA paper suggested by @gwg:
- David Barber, Bayesian Reasoning and Machine Learning, Algorithm 21.1 (textbook) textbook here
- Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 12.2.4 pdf here (paper, same author of PPCA). Scikit-learn referenced this paper for their alogrithm
-
-
Ups you're right. I gave the wrong link. I have updated the download link, but eventually you should find the free pdf of the last link through google immediately. – Daniel Kurniadi Jun 27 '19 at 10:20
Idea, in some cases (aka objective functions), Factor Analysis problem will be reformulated as Low Rank Matrix Approximation Problem. And, SVD can be used to solved Low Rank Matrix Approximation Problem.
factor analysis or cluster document-term matrix without using other clustering techniques- that is unclear. What is your interest ultemately - FA or Cluster analysis? – ttnphns Mar 06 '16 at 19:30(10) 9 (15) -3 6 (6) 18 12 -7 (23) -7 5 8 -4 (8)[lower triangle shown, diag. entries are parenth.). [to cont.] – ttnphns Mar 06 '16 at 21:3844.7361 22.5044 2.3908 -2.1720 -5.4592. But sing. values are:44.7361 22.5044 5.4592 2.3908 2.1720. You see that svd fails to recognize the negative eigenvalues and takes (and sorts) them as if they were positive. – ttnphns Mar 06 '16 at 21:39