1

I have started learning Principal Component Analysis using R and applied it on UsArrests dataset. I have read in an article that eigenvectors in R point in the negative direction by default, so we’ll have to multiply PCA values such as $pca\$rotation $ and $pca\$x$ by -1 to reverse the signs. I am observing that, after changing the signs in the principal components (say, pc1), the interpretation of the pc1 column is getting changed (if I consider the + sign as increasing pattern and - as decreasing) for the variables. My question is that is it necessary to multiply PCA values by -1 or will I use the original value the software is providing?

SKB
  • 11
  • 2
    This is a FAQ here on CV. You are free to multiply any selected set of principal components by $-1.$ In some contexts that flexibility can be exploited: see https://stats.stackexchange.com/questions/34396. – whuber Dec 11 '23 at 16:45

1 Answers1

5

The direction of principal components is arbitrary - flipping the sign of any or all dimensions of a principal component decomposition yields an equally valid set of principal components. Principal components are simply dimensions that capture maximal variability, and changing the sign of a set of numbers changes nothing at all about their variability - both PC1 and the negation of PC1 are equally valid choices for dimensions which contain the highest amount of variability. Whether you choose to multiply your principal components by -1 makes no difference whatsoever - the sign of the principal components has no meaning to begin with. It doesn't matter if you describe Variable 1 as being positively correlated with PC1, or negatively correlated with the negation of PC1.

  • Link I have applied PCA on a protein dataset and got the results shown in the link. In (PC1), can we say that this component may represent a dietary pattern that includes higher consumption of animal-based products and starch as all the related variables have positive sign and lower consumption of cereals and nuts as they have negative sign? Is this the correct interpretation? @Nuclear Hoagie and if not can you please explain how to interpret these principal components in terms of the contribution of the variables? – SKB Dec 11 '23 at 16:15
  • @SKB Yes, higher consumption of RedMeat and Eggs are associated with higher values of PC1 (positive association), while higher consumption of Cereals and FrVeg are associated with lower values of PC1 (negative association). Interpreting PC1 as representing a meat/animal based dietary pattern dimension is very reasonable. Note that it would also be valid to flip all the signs on PC1 and describe the opposite of this dietary pattern, in which case PC1 would represent a dietary pattern that isn't meat/animal based. Just the directionality would flip. – Nuclear Hoagie Dec 11 '23 at 16:37
  • if the directionality would flip, won't it change the actual interpretation of the principal components? I mean which one is real then? If I summarize the countries by these principal components using pca$x values as protein intake parameters, wont it change the meaning after changing the signs of the pc1 values. For example: first the country a,b,c were higher in consumption of RedMeat and Eggs but after changing the signs it interprets other meaning @Nuclear Hoagie – SKB Dec 11 '23 at 16:45
  • @SKB You'll make one interpretation based on whichever way you prefer to set the signs, or whichever way your software does it. You may find Country A has the highest value in the PC1 meat dimension, or alternatively/equivalently the lowest value in the negative-PC1 non-meat dimension. How you describe it as a large positive "meat" value or a large negative "non-meat" value is different, but the overall interpretation either way is that Country A eats a lot of meat. – Nuclear Hoagie Dec 11 '23 at 16:53