2

Assume we have a matrix X = randn(5,3). I am doing two things:

1) [S D1 V1] = svd(X);

2) [V2 D2] = eig(X'*X);

I am getting:

V1 =

   -0.6220    0.5046    0.5987
   -0.6549   -0.7544   -0.0446
   -0.4292    0.4198   -0.7997

and

V2 =

    0.5987    0.5046    0.6220
   -0.0446   -0.7544    0.6549
   -0.7997    0.4198    0.4292

First question: How can we interpret the difference between V1 and V2? why some negative values are getting positive and the values are in reverse order?

Second question: in principal component analysis, one can compute the principal components (PCs) as Z = S*D1 or Z = X*V2. But in this case S*D1 is not equal to X*V2 but X*V1. So the PCs are Z = X*V1 not X*V2 right?

amoeba
  • 104,745
Christina
  • 775
  • 2
    The sign of the components is arbitrary and does not matter, see here: http://stats.stackexchange.com/questions/88880. Regarding the order: Matlab's eig function tends to order the eigenvectors in the order of increasing eigenvalues; the svd function tends to order them in the decreasing order. Hence the order is flipped. One should never rely on the ordering and re-order the components based on the eigenvalues. You can compute Z as X*V1 or as X*V2 and you will get the same thing, just possibly with different signs and in different order. – amoeba Feb 17 '16 at 14:47
  • Thank you for your comment. In fact by using eig, V2 is probably to be complex if the matrix dimension becomes large, whereas with svd, V1 is always real. How can you interpret this fact? Do you think that computing Z=X*V1 is more preferable? – Christina Feb 17 '16 at 14:57
  • SVD is numerically more stable and is usually the preferred way, see http://stats.stackexchange.com/a/87536. Complex values indicate some numerical problems along the way; I would guess that the imaginary part is around the machine precision and so you can write V2=real(V2) and it's going to be fine. But it's better to use SVD. – amoeba Feb 17 '16 at 15:03
  • So in principal component regression(pcr), one can assume: Y=X*beta + e = S * D1 * V1' * beta + e = Z * V1' * beta + e= Z * alpha + e. Since Z = S*D1 = X*V1. am I right? thank you very much for your help. – Christina Feb 17 '16 at 15:10
  • Yes. But you would usually use only a few components in PCR, not all of them. – amoeba Feb 17 '16 at 15:13
  • Of course, it will be interesting to discard the least informative components. My last question is: I am wondering why the author of this article entitled: "Use of the Singular Value Decomposition in Regression Analysis" complicated the things in page 5. So why he didn't simply consider that Z=U*theta (I am just talking about the article in page 5). It will be interesting if you can integrate your comments as an answer :-) – Christina Feb 17 '16 at 15:18
  • Please give a link to the paper, I am not sure which one you are talking about. – amoeba Feb 17 '16 at 16:48
  • http://www.ime.unicamp.br/~marianar/MI602/material%20extra/svd-regression-analysis.pdf – Christina Feb 17 '16 at 21:13

0 Answers0