Does the fraction of explained variance in the first PCA component necessarily drop if you remove features from the dataset?

Question

When performing PCA on a certain dataset and logging the fraction of total variance explained by the components, will this fraction drop for the first component if one or more features (that were present in the component's loadings) are removed and the PCA re-computed?

Asked by one of my students. I said yes but now I have a doubt ...

Then presumably it drops. ok, I think I get it, but I'd still appreciate a comprehensive answer, I don't think that this answer is anywhere on the internet... — SebDL, Apr 26 '22 at 14:27
Most likely the fraction will increase. The most extreme situation is when you go from 2 to 1 features. With 1 feature the 1st PC component will cover 100% of the variance. — Sextus Empiricus, Apr 27 '22 at 15:03

whuber · Answer 1 · 2022-04-27T14:47:25.617

6

Often that will happen, but not necessarily. When the feature that is removed constitutes most of the first PC, you then are basically doing PCA on everything else. The new first PC will be close to the second original PC and its fraction of the total variance could be just about anything $1/(d-1)$ or larger when there are $d-1$ variables left. For $d\ge 3$ this raises the possibility of a decrease in the variance proportion.

Let us, then, produce the smallest possible example, and let's make it simple. I begin with a large vector $(10,0,0)^\prime.$ Now adjoin two simple smaller vectors, say

$$X = \pmatrix{10&0&0\\0&1&1\\0&1&-1}.$$

Doing PCA directly on this matrix (no centering, no scaling) shows the first PC accounts for $100/(100+2+2) \approx 96.15\%$ of the total variance. Removing the first column gives two equal-size orthogonal columns with two PCs each (therefore) contributing $50\%$ to the total.

edited Apr 27 '22 at 14:47

answered Apr 26 '22 at 14:28

whuber

322,774

1

Ok that's much clearer. I should take the habit of designing these minimal examples... I always take something realistic instead, then I get lost... – SebDL Apr 27 '22 at 13:52
How did you go from this matrix to the equation 100/(100+2+2) – hachiko Apr 30 '22 at 18:02
1

@Hachiko All columns are orthogonal, which makes them principal components, and their contributions to the total are the sums of squares of their components. – whuber Apr 30 '22 at 21:02
@whuber I see how the columns in your example are orthogonal because (take the first two columns): A = 10^2 + 0^2 + 0^2 = 100 and B = 0^2 + 1^2 + 1^2 = 2 and A + B = (10 + 0)^2 + (0 + 1)^2 + (0 + 1)^2 = 102, so the first two columns are orthogonal and you can do the same for the other two combinations of columns... I guess what I didn't realize is that with a PC matrix the contribution of each column (in terms of variance explained) is that column's sum of squares versus the total sum of squares of all columns – hachiko May 01 '22 at 09:51
@hachiko Ordinarily PCA is done after centering the columns. In that case the variance of each column is proportional to its sum of squares (and the constant of proportionality is the same for every column). Thus, the proportion of total variance "explained" by any column is the same as its sum of squares as a proportion of the sum of all squares. For an algebraic demonstration of this relationship we can use the SVD of the matrix. – whuber May 01 '22 at 12:16

Does the fraction of explained variance in the first PCA component necessarily drop if you remove features from the dataset?

1 Answers1