0

I would like to perform PCA on a dataset, however, not all of the data is in the same scale. Some variables are Height, Weight, Age, while others are Dribbles per Game, Shots per Game, Blocks per Game.

When I re-scale my data, can I use the standardizaiton method such that each newly transformed varaible has a mean of 0 and standard deviation of 1? So my process would be take each value, subtract the mean, divide by the standard deviation.

Also, I have read in some cases for variable like height or weight, it is better to take the log-normal transformation? Once you do that though, the scale is not helpful.

When I perfrom the PCA in R, I planned on using the covariance matrix to see how much of the data I explain with my 2 principal components.

Ferdi
  • 5,179
  • It's not a duplicate because I am not asking about correlation vs covariance. – Jack Armstrong Oct 10 '18 at 12:16
  • You could consider using maximum likelihood Factor Analysis. The results are not tied to the scale used in the same way. – conjectures Oct 10 '18 at 12:31
  • Yes you are. "So my process would be take each value, subtract the mean, divide by the standard deviation. " - this means using correlation. Please read the accepted answer in the linked thread. – amoeba Oct 10 '18 at 12:44
  • Since I have scaled my variables, both the correlation and covariance then are the same? – Jack Armstrong Oct 10 '18 at 12:53

0 Answers0