I have a data $1600\times5000$ matrix $X$ containing 1600 datapoints in 5000-dimensional space. Using MATLAB's built-in pca function, I get the loadings in coeff.
In theory, coeff*coeff' should give us a almost-indentity matrix. For example:
coeff = pca(rand(1000,1000));
coeff*coeff';
However, in my case, coeff*coeff' is far away from identity, with some of the diagonal entries as low as 0.01. As a result, if I wish to reconstruct my data points, even with all the PCs, I worry that the results may be lousy.
What is the possible explanation for this? And is there a way I can get around this problem?
coeff=pca(rand(1600,5000)), you will see thatcoeffis of 5000x1599 size. Meaning that with 1600 points you can only find 1599 principal components in 5000-dimensional space. Of coursecoeff*coeff'will then not be an identity 5000x5000 matrix, because it will be low rank and have 5000-1599 zero eigenvalues. The reconstruction of your data points should still be perfect though (and not "lousy"), because all your points lie in this 1599-dimensional subspace. Does this make sense? – amoeba Apr 23 '15 at 17:09