3

I have a high-dimensional data matrix X where sample size is smaller than the variable size. I want to use PCA as a dimensionality reduction method, but I cannot call it directly in R since the matrix X is rank-deficient. I saw below technique in an R code to get the principal component representation from a rank-deficient matrix:

1) Get U from svd(XXT)

2) Get the principal component representation C by solving X = UC

I am new to PCA, and I have no idea how that makes sense. Could someone clarify how those 2 steps is equivalent to applying PCA on X?

amoeba
  • 104,745
user5054
  • 1,549
  • 1
    Probable duplicate, at least closely related: http://stats.stackexchange.com/q/147880/3277. – ttnphns May 24 '15 at 04:51
  • 1
    There are many questions asked about the relation between PCA and SVD, but my specific question is that I would expect the 1st step to be svd(X) rather than svd(XX^T). – user5054 May 24 '15 at 04:56
  • 2
    Note that since XX' (as well as X'X) is square symmetric its svd() = its eigendecomposition(). – ttnphns May 24 '15 at 04:59
  • Principal component analysis (PCA) is usually explained via "an eigen-decomposition of the covariance matrix (XX^T)" or via "a singular value decomposition (SVD) of the data matrix itself (X)". That's what confuses me. Is it okay to use either svd(X) or svd(XX^T) in the 1st step? – user5054 May 24 '15 at 05:12
  • Yes. (http://stats.stackexchange.com/q/79043/3277), you may use both. – ttnphns May 24 '15 at 05:22
  • Thanks, this helps a lot! Also, is there a problem with the 2nd part? It calculates the principal components using U^T * X, but what is explained in the link you sent is to calculate principal components using U * S, where S is the matrix of the singular values. – user5054 May 24 '15 at 05:56
  • 1
    What do you mean you cannot call it directly? Just use prcomp and not princomp. – amoeba May 24 '15 at 06:59

1 Answers1

3

PCA amounts to finding and interpreting a singular value decomposition (SVD) of $X$ (or any closely-related matrix obtained by various forms of "standardizing" the rows and/or columns of $X$). There are direct connections between any SVD of $X$ and any SVD of $XX^\prime$. I will exhibit the connections in both directions, by obtaining an SVD of one from an SVD of the other.


1. Any SVD of $X$ determines a unique SVD of $XX^\prime$.

By definition, an SVD of $X$ is a representation in the form

$$X = U\Sigma V^\prime$$

where $\Sigma$ is diagonal with non-negative entries and $U$ and $V$ are orthogonal matrices. Among other things, this implies $V^\prime V = \mathbb{I}$, the identity matrix.

Compute

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.


2. Any SVD of $XX^\prime$ gives (at least one) SVD of $X$.

Conversely, because $XX^\prime$ is symmetric and positive-definite, by means of an SVD or with the Spectral Theorem it can be diagonalized via an orthogonal transformation $U$ (for which both $UU^\prime$ and $U^\prime U$ are identity matrices):

$$XX^\prime = U\Lambda^2 U^\prime.$$

In this decomposition, $\Lambda$ is an $n\times n$ diagonal matrix ($n$ is the number of rows of $X$) with $r = \text{rank}(X)$ non-zero entries which--without any loss of generality--we may assume are the first $r$ entries $\Lambda_{ii}, i=1,2,\ldots, r$. Define $\Lambda^{-}$ to be the diagonal matrix with entries $1/\Lambda_{ii}, i=1,2,\ldots, r$ (and zeros otherwise). It is a generalized inverse of $\Lambda$. Let

$$Y = \Lambda^{-}U^\prime X$$

and compute

$$YY^\prime = \Lambda^{-} U^\prime XX^\prime U \Lambda^{-} = \Lambda^{-} \Lambda^2 \Lambda^{-} = \mathbb{I}_{r;n}.$$

(I have employed the notation $\mathbb{I}_{r;n}$ for an identity-like $n\times n$ matrix of rank $r$: it has $r$ ones along the diagonal and zeros everywhere else.) This result implies $Y$ must be a block matrix of the form

$$Y = \pmatrix{W^\prime & 0 \\ 0 & 0}$$

with $W$ an $r\times r$ orthogonal matrix. The zeros are matrices of appropriate dimensions to fill out the rest of $Y$ (which has the same dimensions as $X$). Let $p$ be the number of columns of $X$. Certainly $r \le p$. If $r \lt p$, we may extend $W$ to a $p\times p$ orthogonal matrix $V$. This can be done in many ways, but a simple one is to put $W^\prime$ in the upper left block, an identity matrix of dimensions $p-r\times p-r$ in the lower right block, and zeros everywhere else.

Since (by construction) $V^\prime V = \mathbb{I}_p$,

$$U^\prime X V = \Lambda Y V = \Lambda (V^\prime V) = \Lambda.$$

Upon left-multiplying by $U$ and right-multiplying by $V^\prime$ we obtain

$$U\Lambda V^\prime = (UU^\prime) X (VV^\prime) = X,$$

which is an SVD of $X$.

$$XX^\prime = (U\Sigma V^\prime)(U\Sigma V^\prime)^\prime= (U\Sigma V^\prime)(V\Sigma U^\prime) = U\Sigma V^\prime V \Sigma U^\prime = U\Sigma \mathbb{I} \Sigma U^\prime = U (\Sigma^2) U^\prime.$$

Since $\Sigma^2$ is diagonal with non-negative entries and $U$ is orthogonal, this is an SVD of $XX^\prime$.

whuber
  • 322,774