1

I performed PCA using two different R functions fviz() and ggbiplot() to improve the visualization of data. The sum of the first two PC is equal between the two methods, but there is a different value of variance explained for these principal components depending on the method. The ggbiplot() gives standardized PC1 & PC2, which I don't quite understand!

When are standardized PC important to use? If not, then how can I restore the PC values for ggbiplot()?

bobo
  • 23
  • 2
    See https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance?noredirect=1&lq=1 – utobi Dec 30 '22 at 23:15

1 Answers1

1

According to this, ggbiplot() is just a visualization function and doesn't not actually calculate principal components. The first argument supplied to ggbiplot() is an object returned by prcomp() (or princomp()). prcomp() provides two arguments to control whether data are centered (center, default is TRUE) and scaled (scale., default is FALSE). Generally, centering and scaling are advised, in particular if your original data has very different scales for the columns. If they are not scaled to unit variance, prcomp() still works, but the principal components returned will also have very different scales, making visualization less intuitive.

KirkD_CO
  • 1,138
  • There are pros and cons to standardization for PCA. I think there is not a reason to automatically choose one or another. See: https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance – Sycorax Dec 30 '22 at 22:57
  • Well, I did say "Generally, centering and scaling are advised..." and refer to the differences in scale as a potential issue, as mentioned in the linked thread. I did not bring up covariance vs correlation, which the linked thread does describe. I do yield that "automatically" choosing on or another is not advised, and the linked thread does a far better job of fully exploring the option that I did in my short answer. – KirkD_CO Dec 30 '22 at 23:12