Normalizing all the variarbles vs. using scale=TRUE option in prcomp in R

Question

What is the difference between

normalizing the variables and doing PCA;
using scale=TRUE option (without normalizing the variables) in prcomp function in R?

I erased your last sentence/paragraph because it was very hard to understand while your question is very clear already without it. — amoeba, Mar 18 '17 at 15:59

score 9 · Accepted Answer · edited Apr 13 '17 at 12:44

9

No difference. Type debug(prcomp) before running prcomp. The third line of the function reads: x <- scale(x, center = center, scale = scale.); ie. you will either scale within the function if you set scale = TRUE during function call or you will have the scaling done originally by you.

Having said that, when applying PCA in general it is a good idea to scale your variables. Otherwise the magnitude to certain variables dominates the associations between the variables in the sample. Unless all your variables are recorded in the same scale and/or the difference in variable magnitudes are of interest I would suggest you normalise your data prior to PCA. This issue has been revisited multiple time within CV eg. 1, 2, 3.

edited Apr 13 '17 at 12:44

Community

1

answered Mar 18 '17 at 12:19

usεr11852

44,125

What if all your variables are on the same scale? – Jack Armstrong Apr 30 '19 at 07:23
1

We probably do not need the normalisation in that case because the variables will be comparable in their original scales. Please read through the linked threads for more details. – usεr11852 Apr 30 '19 at 08:19

score 1 · Answer 2 · answered Mar 29 '21 at 15:42

1

Using the correlation matrix is equivalent to standardizing each of the variables (to mean 0 and standard deviation 1). In general, PCA with and without standardizing will give different results. Especially when the scales are different.

scale=TRUE bases the PCA on the correlation matrix and FALSE on the covariance matrix

For example:

#my data
set.seed(1)
x<-rnorm(10,50,4)
y<-rnorm(10,50,7)
df<-data.frame(x,y)
PCA based on covariance matrix and on Correlation matrix
PCA_df.cov <- prcomp(df, scale=FALSE)
PCA_df.corr <- prcomp(df, scale=TRUE)

answered Mar 29 '21 at 15:42

Hamed Said

11

1

The question asks what the difference is between scaling the data and using scale=TRUE. Your code example just shows that toggling scale=TRUE and scale=FALSE produces different results, which doesn't do much to explain why those results are different. I think the code example would be more clear if you used it to demonstrate that scaling the data and setting scale=TRUE produce the same result, and showing that is the same performing PCA on the covariance matrix. In other words, use code to demonstrate the claims you make in text. – Sycorax Mar 29 '21 at 15:48

Normalizing all the variarbles vs. using scale=TRUE option in prcomp in R

2 Answers2