9

In deriving the eigenvectors for PCA, the vector is subject to the condition that it should be of unit length. Why is this so?

  • 2
    To put it in words and simply - and this concurs with both already given answers - the task is to split information stored in covariances into two clear parts: a part with all the variability - eigenvalues, and a part with "no" variability (unit variability) which shows directions for the variability - eigenvectors. – ttnphns Oct 03 '14 at 01:20

2 Answers2

11

The main aim of Principal Component Analysis (PCA) is to look for the directions on $\mathbb{R}^p$ that maximize the variance of the projected random vector $X=(X_1,\ldots,X_p)$. Specifically, the first PC can be defined as the unit vector $v_{(1)}\in\mathbb{R}^p$ such that $$v_{(1)}=\arg\max_{v\in\mathbb{R}^p,||v||=1}\mathbb{V}\mathrm{ar}\big[v^TX\big].$$

If you allow vectors that are not of unit norm in the maximization problem, then you will not get a proper solution, since variance of the projection can become arbitrarily large as long as the norm of the vector increases. For example, if $w=\lambda v$, with $v,w\in\mathbb{R}^p$ and $\lambda\to\infty$, then

$$\mathbb{V}\mathrm{ar}\big[w^TX\big]=\lambda^2\mathbb{V}\mathrm{ar}\big[v^TX\big]\to\infty\quad (\text{if }\mathbb{V}\mathrm{ar}\big[v^TX\big]\neq0).$$ This is the reason why you need an standardization of unit norm to constraint the search and avoid improper solutions.

epsilone
  • 776
8

It is not true that they "should be of unit length"; PCA works fine without using unit vectors given your data $x$ as long as you use a fixed arbitrary length $l$.

Having said that you want to have the eigenvectors $\alpha_k$ of your covariance matrix $C$ to be unit vectors, ie. $\alpha_k^T \alpha_k = 1$, so you can:

  1. Use the associated eigenvalue $\lambda_k$ as the variance of $\alpha_k^T x$.
  2. Use the eigenvectors as an axis of the ellipsoid fitted to $x$.

The first chapter from Jolliffe's Principal Component Analysis (Introduction) gives a more detailed (and nicer) exposition of these issues.

usεr11852
  • 44,125