0

I have a data set in which two variables are collinear (r^2 ≈ 0.7). I decided to extract the principal components, and then include these in a regression analysis to see which of the two variables seemed to be more important.

When I run pca<-prcomp(dataset[var1, var2], scale=TRUE), and then view the factor loadings I get using the pca command, the output is:

Standard deviations (1, .., p=2):
[1] 1.3199080 0.5077823

Rotation (n x k) = (2 x 2): PC1 PC2 var1 -0.7071068 0.7071068 var2 -0.7071068 -0.7071068

I know this isn't normal, but I'm really confused about what this is telling me about my data. Why are the loadings identical? I can't say anything about which variable is more important now, by the looks of things? When regressed, the first component is a highly significant predictor of the response variable whilst the second is not.

The raw data is shown in Figure 1, and plotted on these PCAs in Figure 2. I'd be grateful for any insight anyone could give.

Figure 1 enter image description here

  • 2
    These aren't loadings, but eigenvectors. Cos of 45 degree rotation. This is what PCA always amounts to with a 2x2 correlaton matrix. – ttnphns Dec 16 '21 at 17:06
  • Ah, OK thank you . Is there a way to tell which variable is 'dominating' in PC1 and which in PC2? Or is this not possible with a 2x2 design? – user265883 Dec 16 '21 at 17:10
  • Neither dominates because there is only 2 variables and their variances are equal (I suppose you analyzed correlation, not covariance matrix). – ttnphns Dec 16 '21 at 19:09

0 Answers0