3

Imagine I've the following matrix, which gives the grades of students in the subjects German, Philosophy, Math and Physics:

ger = c(2,4,1,3,2,4,4,1,2,3)
phi = c(3,4,1,2,2,3,3,2,2,2)
mat = c(1,3,2,4,1,2,2,4,3,1)
phy = c(2,2,2,5,2,2,3,4,3,3)
A = cbind(deu,phil,ma,phy)

I combine everything to a matrix and scale the data:

As = scale(A)

Now, I perform a summary on the PCA:

summary(princomp(As), loadings = TRUE)

Which returns the following output:

Importance of components:
                       Comp.1    Comp.2     Comp.3     Comp.4
Standard deviation     1.3257523 1.1657791 0.59600603 0.35793402
Proportion of Variance 0.4882275 0.3775114 0.09867311 0.03558799
Cumulative Proportion  0.4882275 0.8657389 0.96441201 1.00000000

Loadings [eigenvectors]:
     Comp.1 Comp.2 Comp.3 Comp.4
ger  0.496 -0.502  0.519  0.482
phi  0.548 -0.443 -0.423 -0.570
mat  -0.430 -0.572 -0.546  0.435
phy  -0.518 -0.474  0.503 -0.503

I have a few hints for the first component (based on the loadings [eigenvectors]):

  • There is a high positive correlation between german and philosophy and there is also a high positive correlation between math and physics.
  • Who is good in language (german and philosophy) is often worse in MINT (math and physics) and the other way around.

And an idea about the second one, which I cannot interpret:

  • It's a weighted arithmetic mean over all four variables.

But I have no idea how to interpret the Comp. 2, Comp. 3 and Comp. 4 based on the loadings. Especially because all values of Comp. 2 are all negative, or have the same orientation. Can someone help me? Thanks in advance!

ttnphns
  • 57,480
  • 49
  • 284
  • 501
So S
  • 553
  • 5
  • 9
  • The sign of each component is arbitrary and does not have any meaning: http://stats.stackexchange.com/questions/88880. So PC2 is approximately the mean. – amoeba Oct 01 '16 at 10:41
  • 1
    The last few components account for little variation and are probably noise. – mdewey Oct 01 '16 at 10:47
  • 1
    I edited your question (please re-edit if you don't agree). The table you show is eigenvectors, not loadings. The R function uses word "loadings" incorrectly. Search this site for PCA loadings eigenvectors, to read about the distinction. Eigenvector values are not correlations (There is a high positive correlation between...), they are rotation cosines. – ttnphns Oct 01 '16 at 10:54
  • @amoeba: Okay. I know that the signs are convertible for each eigenvector (thanks @ttnphns for changing loadings to eigenvectors), nevertheless I still believe that the ratio between the signs (e.g. one is positive and another negative for two variables within a component) is interpretable and can provide some information about the data. – So S Oct 01 '16 at 16:34
  • @JohnSmith Sure. I was talking about your PC2 where it seems you were confused about all signs being negative, weren't you? – amoeba Oct 01 '16 at 19:35
  • No. Maybe, I didn't describe it clearly ;). If my hints about the interpretation of the first component were right, than how would I interpret component two (all variables having the same sign, independently from the sign itself)? – So S Oct 01 '16 at 19:42
  • @JohnSmith Ah, okay. PC2 corresponds to the average grade, i.e., roughly speaking, it distinguishes smart people from dumb people. – amoeba Oct 01 '16 at 22:18

1 Answers1

1

If $Ax=\lambda x$ then $A( -x) = \lambda( -x)$. Thus you may choose to flip all signs as you like.

The last eigenvector by necessity is orthogonal to all previous, so you can't really interpret it.

I think your interpretation of the first is quite decent. People tend to be good in the first two only, or in the second two only.

The second is the overall tendency of having good grades everywhere, or bad grades everywhere (independent of the subject).

The third and fourth factors simply give the remaining possibilities to divert from these two factors. So one is good-or-bad (correlation) in ger and phy only, the other is good-or-bad in ger and mat only.

Good in everything but phy can be modeled as a linear combination of these factors.

  • +1. So PC1 corresponds to geekiness and PC2 corresponds to smartness. – amoeba Oct 01 '16 at 22:33
  • @Anony-Mousse: Is there also a way to interpret the values of the eigenvectors? I know, that R is skipping the values close to zero by filling the fields with blank space, because there is no meaning. As a consequence of that, I would assume that the numbers which are there have some meaning. How could I interpret them? – So S Oct 03 '16 at 13:24
  • The amount of variance. Search for "variance explained". – Has QUIT--Anony-Mousse Oct 03 '16 at 16:32