4

In the PCA transformation below, you can see a dataset with 2 original features and a transformed version with two principal components. I understand why we can have fewer PCs, but why not more? What am I missing.

PCA example

  • 4
    PCA calculates as many components as input variables. In a specific way, PCA finds a new coordinate system for the data. It can't make data up where there isn't any. – COOLSerdash Dec 07 '22 at 19:56
  • 2
    See, for example this post or this one for some graphical and intuitive explanations. – COOLSerdash Dec 07 '22 at 20:02
  • 9
    You could always introduce any number of additional "PCs," but they would have to be orthogonal to all the original PCs, whence they would be orthogonal to the data and therefore give you no additional information. It sounds like you really need some intuition, so please read https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues. – whuber Dec 07 '22 at 20:15
  • @whuber, if we have already have two vectors in a two dimensional space, for instance $v_1=\begin{bmatrix}0.5\0.5\end{bmatrix}$ and $v_2=\begin{bmatrix}-0.5\0.5\end{bmatrix}$, then how can we add additional orthogonal vectors? – Sextus Empiricus Dec 08 '22 at 11:00
  • @SextusEmpiricus By embedding the plane within a larger Euclidean space. – whuber Dec 08 '22 at 15:00
  • @whuber, so say I have data about squares and I measured the height and width, how do I add another dimension? This embedding seems a bit like an artificial and arbitrary operation. – Sextus Empiricus Dec 08 '22 at 16:31
  • @Sextus One way to add another dimension is to include another variable. After all, when you are doing PCA with an $n\times p$ matrix with $p\lt n,$ you are already working in a larger Euclidean space (of $n$ dimensions). Generally, even an abstract embedding (such as extending all the column vectors by zeros) is no more artificial or arbitrary than any other mathematical construction. – whuber Dec 08 '22 at 18:07
  • @whuber when we are freely able to change the definition of a principle component then why not just add principle component that is not orthogonal at all? – Sextus Empiricus Dec 08 '22 at 18:40
  • @SextusEmpiricus I don't follow, because "not orthogonal" would violate the very essence of the PCA construction. Embedding the space in a larger space does not "freely change" the definition, because the solution doesn't change. All it would do (nominally) is introduce more PCs with zero eigenvalues. – whuber Dec 08 '22 at 18:44
  • 1
    @whuber I see now that you are right, you can embed the space into a larger space. I don't follow myself completely either, but something feels artificial about it to me. I had always regarded PCA as a method to fit a plane to the data, and now we suddenly generate additional data dimensions (albeit zero's) that may have no relationship with the objects that are being studied. But even though this feels artificial to me, it has practical use (https://stats.stackexchange.com/questions/527258/embedding-data-into-a-larger-dimension-space). – Sextus Empiricus Dec 09 '22 at 07:05
  • @Sextus To see it needn't be artificial, imagine analyzing compositional data with $p$ variables. Many people would drop one variable first, knowing they must sum to unity, and perform PCA on the remaining $p-1$ variables. Others would proceed directly with all $p$ variables, knowing PCA will automatically identify the eigenvector $(1,1,\ldots,1)$ with zero eigenvalue. If you were given the $p-1$ column matrix you could "artificially" add one more "slack" column (increasing the dimension of the space) to create a sum-to-unity condition and thereby move from the first situation to the second. – whuber Dec 09 '22 at 14:33

1 Answers1

11

One way to think about this is via the orthogonality constraint imposed by PCA. From Wikipedia

The principal components of a collection of points in a real coordinate space are a sequence of $p$ unit vectors, where the $i$-th vector is the direction of a line that best fits the data while being orthogonal to the first $i − 1$ vectors.

Since you are working in two-dimensional space, there is no other possible direction for a new principal component that would be orthogonal to the first two.

Aman
  • 385
  • On a side note, the orthogonality constraint is imposed to prevent two PCs from having overlapping variances right? – GaryTheBaddy Dec 07 '22 at 21:19
  • 3
    @GaryTheBaddy Correct, we don't want two components explaining the same variance in the dataset. From a more mathematical point of view, orthogonality can be seen as a consequence of the symmetry of the covariance matrix (which means it has orthogonal eigenvectors). – Aman Dec 07 '22 at 22:57