9

I have applied PCA to a dataset using two different R packages viz. "stats" and "FactoMineR". For both models, I am getting different loadings. What may be the probable reason behind such a difference?


prcomp {stats}

Rotation (n x k) = (6 x 6):

        PC1        PC2         

THN 0.5597064 -0.1736148

GRH 0.5289944 -0.2447140

SRD -0.4187044 -0.2400785

THML -0.2361008 -0.5718485

GLI 0.3308003 0.4284650

GRMA 0.2576972 -0.5845873


PCA {FactoMineR}

$coord

      Dim.1      Dim.2       

THN 0.9569614 0.2603887

GRH 0.9044514 0.3670237

SRD -0.7158823 0.3600715

THML -0.4036748 0.8576624

GLI 0.5655878 -0.6426149

GRMA 0.4405993 0.8767682


#PCA using stats
myPr <- prcomp(my_data[,-1], center = TRUE, scale. = TRUE)
myPr

#PCA using FactoMineR library(FactoMineR) pr <- PCA(my_data[,-1], scale.unit = TRUE, graph = TRUE) pr$var

#Data subset
my_data <-
  structure(
    list(
      Genotype = c(
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "BHIMA",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSDG",
        "NSLG",
        "NSLG",
        "NSLG",
        "NSLG",
        "NSLG"
      ),
      THN = c(
        4.875,
        5.75,
        5,
        3.75,
        3.66666666666667,
        9.0625,
        10.7,
        7.52941176470588,
        8,
        12,
        4.25,
        3.66666666666667,
        5.2,
        6,
        5.25
      ),
      GRH = c(
        102.4643125,
        124.635333333333,
        95.4393888888889,
        77.5246666666667,
        74.8633111111111,
        188.138770833333,
        170.288916666667,
        142.064431372549,
        152.74425,
        226.286,
        86.3530833333333,
        80.1745555555556,
        109.171533333333,
        134.698666666667,
        122.7548
      ),
      SRD = c(
        20.9620954613095,
        21.72740625,
        18.9397321428571,
        21.9752083333333,
        19.768965,
        20.6400036051417,
        17.4263925682651,
        18.8010196737519,
        18.8909826388889,
        18.6996699829932,
        20.685875,
        21.2876111111111,
        21.0639923809524,
        22.5462371428571,
        23.3662583333333
      ),
      THML = c(
        365.522017936508,
        336.32533125,
        343.027892857143,
        365.968166666667,
        352.466585,
        380.315924009015,
        438.888573775299,
        384.717647703081,
        376.74879890873,
        357.975157908163,
        481.1476125,
        497.496111111111,
        599.674862857143,
        694.572626666667,
        545.377958333333
      ),
      GLI = c(
        0.0962519040291448,
        0.185481840453265,
        0.210969297714791,
        0.127273546499529,
        0.128791133588149,
        0.104009842530975,
        0.105509742573322,
        0.0932643040680137,
        0.0952753840583971,
        0.214037130114979,
        0.0414932567676607,
        0.0429400045716451,
        0.0735321530011779,
        0.0355329427205008,
        0.0458192673112064
      ),
      GRMA = c(
        0.04171768575,
        0.0463768018,
        0.039131903,
        0.0427794242,
        0.0327366384,
        0.082782452375,
        0.0812153345833334,
        0.0690879901764706,
        0.065846674625,
        0.0885941995714286,
        0.04575302325,
        0.0408745306666667,
        0.0792771544,
        0.1083637692,
        0.0836376916
      )
    ),
    class = c("tbl_df",
              "tbl", "data.frame"),
    row.names = c(NA,-15L)
  )
Nick Cox
  • 56,404
  • 8
  • 127
  • 185

2 Answers2

13

The values you get from FactoMineR are not loadings: they are coordinates. Loadings are coordinates divided by the square root of eigenvalues.

I don't know if you can directly get loadings from FactoMineR. However, you can retrieve eigenvalues (in your example, through pr$eig), so you can calculate loadings from coordinates.

Example with your data, for the first dimension of the PCA:

library(FactoMineR) 
pr <- PCA(my_data[,-1], 
          scale.unit = TRUE,
          graph = TRUE) 
res = pr$var$coord[,"Dim.1"] / sqrt( pr$eig["comp 1","eigenvalue"])

res

THN        GRH        SRD       THML        GLI       GRMA 

0.5597064 0.5289944 -0.4187044 -0.2361008 0.3308003 0.2576972

-- which is in line with your results from prcomp, so there is no inconsistency between the two libraries (at least in this respect):

myPr <- prcomp(my_data[,-1], center = TRUE, scale. = TRUE)
myPr$rotation[,"PC1"]
     THN        GRH        SRD       THML        GLI       GRMA 
 0.5597064  0.5289944 -0.4187044 -0.2361008  0.3308003  0.2576972 

Edit: After digging a bit, it is in fact possible to directly get loadings from the factoMineR package without resorting to the above code (I leave it as it is though, to illustrate the underlying calculations). In the data from the original question, you can find the loadings in pr$svd$V.

J-J-J
  • 4,098
  • I have retrieved the loadings for the second dimension which is also in line with the prcomp result. The only difference that still remains is the same sign on the numbers. So should I ignore this fact and proceed further? because only the direction will be altered and not the relative difference. Which one should I prefer, prcomp or FactoMineR? – Chaitanya Borkar Jul 21 '23 at 09:54
  • 1
    @ChaitanyaBorkar These answers should answer your own question: https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers and https://stackoverflow.com/questions/67258885/pca-scores-for-only-the-first-principal-components-are-of-wrong-sign . (In short, the question of differing signs is not really a problem, so you can use one or the other library). By the way, in the meantime, I found out how to directly get loadings from factoMineR, try: pr$svd$V. – J-J-J Jul 21 '23 at 10:32
  • 1
    Thank you very much. My doubts are clear in this aspect. – Chaitanya Borkar Jul 21 '23 at 10:56
11

Principal components are the eigenvectors of the covariance matrix. Eigenvectors are only defined up to a multiplicative constant. In this case, both results are identical up to a multiplicative constant:

> x <- c(0.5597064, 0.5289944, -0.4187044, -0.2361008, 0.3308003, 0.2576972)
> y <- c(0.9569614, 0.9044514, -0.7158823, -0.4036748, 0.5655878, 0.4405993)
> abs(cor(x,y))
[1] 1

The result of prcomp is normalized to Euclidean length one, the result of FactoMiner not:

> sqrt(sum(x^2))
[1] 1
> sqrt(sum(y^2))
[1] 1.709756
Nick Cox
  • 56,404
  • 8
  • 127
  • 185
cdalitz
  • 5,132