Why am I getting different PCA loadings for the same data set using two different r packages?

Question

I have applied PCA to a dataset using two different R packages viz. "stats" and "FactoMineR". For both models, I am getting different loadings. What may be the probable reason behind such a difference?

prcomp {stats}
Rotation (n x k) = (6 x 6):
        PC1        PC2         

THN   0.5597064 -0.1736148
GRH    0.5289944 -0.2447140
SRD   -0.4187044 -0.2400785
THML  -0.2361008 -0.5718485
GLI    0.3308003  0.4284650
GRMA   0.2576972 -0.5845873

PCA {FactoMineR}
$coord
      Dim.1      Dim.2       

THN   0.9569614  0.2603887
GRH   0.9044514  0.3670237
SRD  -0.7158823  0.3600715
THML -0.4036748  0.8576624
GLI   0.5655878 -0.6426149
GRMA  0.4405993  0.8767682

#PCA using stats
myPr <- prcomp(my_data[,-1], center = TRUE, scale. = TRUE)
myPr
#PCA using FactoMineR 
library(FactoMineR)
pr <- PCA(my_data[,-1], scale.unit = TRUE, graph = TRUE)
pr$var

#Data subset
my_data <-
  structure(
    list(
      Genotype = c(
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSLG",
        "NSLG",
        "NSLG",
        "NSLG",
        "NSLG"
      ),
      THN = c(
        4.875,
        5.75,
        5,
        3.75,
        3.66666666666667,
        9.0625,
        10.7,
        7.52941176470588,
        8,
        12,
        4.25,
        3.66666666666667,
        5.2,
        6,
        5.25
      ),
      GRH = c(
        102.4643125,
        124.635333333333,
        95.4393888888889,
        77.5246666666667,
        74.8633111111111,
        188.138770833333,
        170.288916666667,
        142.064431372549,
        152.74425,
        226.286,
        86.3530833333333,
        80.1745555555556,
        109.171533333333,
        134.698666666667,
        122.7548
      ),
      SRD = c(
        20.9620954613095,
        21.72740625,
        18.9397321428571,
        21.9752083333333,
        19.768965,
        20.6400036051417,
        17.4263925682651,
        18.8010196737519,
        18.8909826388889,
        18.6996699829932,
        20.685875,
        21.2876111111111,
        21.0639923809524,
        22.5462371428571,
        23.3662583333333
      ),
      THML = c(
        365.522017936508,
        336.32533125,
        343.027892857143,
        365.968166666667,
        352.466585,
        380.315924009015,
        438.888573775299,
        384.717647703081,
        376.74879890873,
        357.975157908163,
        481.1476125,
        497.496111111111,
        599.674862857143,
        694.572626666667,
        545.377958333333
      ),
      GLI = c(
        0.0962519040291448,
        0.185481840453265,
        0.210969297714791,
        0.127273546499529,
        0.128791133588149,
        0.104009842530975,
        0.105509742573322,
        0.0932643040680137,
        0.0952753840583971,
        0.214037130114979,
        0.0414932567676607,
        0.0429400045716451,
        0.0735321530011779,
        0.0355329427205008,
        0.0458192673112064
      ),
      GRMA = c(
        0.04171768575,
        0.0463768018,
        0.039131903,
        0.0427794242,
        0.0327366384,
        0.082782452375,
        0.0812153345833334,
        0.0690879901764706,
        0.065846674625,
        0.0885941995714286,
        0.04575302325,
        0.0408745306666667,
        0.0792771544,
        0.1083637692,
        0.0836376916
      )
    ),
    class = c("tbl_df",
              "tbl", "data.frame"),
    row.names = c(NA,-15L)
  )

Check this SO post: PCA from FactoMineR::PCA() not returning same as base::prcomp(). — User1865345, Jul 21 '23 at 07:51

J-J-J · Accepted Answer · 2023-07-21T10:48:44.940

The values you get from FactoMineR are not loadings: they are coordinates. Loadings are coordinates divided by the square root of eigenvalues.

I don't know if you can directly get loadings from FactoMineR. However, you can retrieve eigenvalues (in your example, through pr$eig), so you can calculate loadings from coordinates.

Example with your data, for the first dimension of the PCA:

library(FactoMineR) 
pr <- PCA(my_data[,-1], 
          scale.unit = TRUE,
          graph = TRUE) 
res = pr$var$coord[,"Dim.1"] / sqrt( pr$eig["comp 1","eigenvalue"])
res
THN        GRH        SRD       THML        GLI       GRMA 

0.5597064  0.5289944 -0.4187044 -0.2361008  0.3308003  0.2576972

-- which is in line with your results from prcomp, so there is no inconsistency between the two libraries (at least in this respect):

myPr <- prcomp(my_data[,-1], center = TRUE, scale. = TRUE)
myPr$rotation[,"PC1"]
     THN        GRH        SRD       THML        GLI       GRMA 
 0.5597064  0.5289944 -0.4187044 -0.2361008  0.3308003  0.2576972

Edit: After digging a bit, it is in fact possible to directly get loadings from the factoMineR package without resorting to the above code (I leave it as it is though, to illustrate the underlying calculations). In the data from the original question, you can find the loadings in pr$svd$V.

I have retrieved the loadings for the second dimension which is also in line with the prcomp result. The only difference that still remains is the same sign on the numbers. So should I ignore this fact and proceed further? because only the direction will be altered and not the relative difference. Which one should I prefer, prcomp or FactoMineR? — Chaitanya Borkar, Jul 21 '23 at 09:54
@ChaitanyaBorkar These answers should answer your own question: https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers and https://stackoverflow.com/questions/67258885/pca-scores-for-only-the-first-principal-components-are-of-wrong-sign . (In short, the question of differing signs is not really a problem, so you can use one or the other library). By the way, in the meantime, I found out how to directly get loadings from factoMineR, try: pr$svd$V. — J-J-J, Jul 21 '23 at 10:32
Thank you very much. My doubts are clear in this aspect. — Chaitanya Borkar, Jul 21 '23 at 10:56

score 11 · Answer 2 · edited Jul 21 '23 at 08:00

Principal components are the eigenvectors of the covariance matrix. Eigenvectors are only defined up to a multiplicative constant. In this case, both results are identical up to a multiplicative constant:

> x <- c(0.5597064, 0.5289944, -0.4187044, -0.2361008, 0.3308003, 0.2576972)
> y <- c(0.9569614, 0.9044514, -0.7158823, -0.4036748, 0.5655878, 0.4405993)
> abs(cor(x,y))
[1] 1

The result of prcomp is normalized to Euclidean length one, the result of FactoMiner not:

> sqrt(sum(x^2))
[1] 1
> sqrt(sum(y^2))
[1] 1.709756

Why am I getting different PCA loadings for the same data set using two different r packages?

2 Answers2