One of properties of PCA states that the sum of the variances of the principal components is equal to the sum of the variances of the explanatory variables. I wonder how to interpret this as I've always thought that we do not consider $X$'s as random variables. I'm quite new to probability theory and I need to get it straight: are explanatory variables random variables (or do we consider them fixed)? And if we don't consider them random, how is it possible to apply the variance operator to a variable that is not stochastic?
1 Answers
A variable is anything that does (or even can) take different values. Not all variables are random variables. When we do regression analyses, we consider the explanatory / predictor variables to be fixed, even when we sampled their values. This is because we are interested in understanding the response as a function of the explanatory variables. In another context, and with different goals, we can take the explanatory variables as stochatic, if appropriate. There is an interesting philosophical issue here, but it is a bit moot. Even in laboratory experiments, where all variables are controlled and set a-priori exactly at fixed levels, the explanatory variables certainly do have variances (albeit PCA would be tremendously uninteresting in such a case).
- 145,122
-
Thanks gung. By the way - is it mathematically legit to take variance of variable which we do not assume to be random? – Pasato Jun 10 '14 at 21:48
-
1Variance is simply a property of a set of numbers. The existence of a variance does not presuppose that the variable was stochastic. You can take a variance of a variable fixed at prespecified levels. What you might legitimately conclude from the variance may be another issue, though. – gung - Reinstate Monica Jun 10 '14 at 21:51
actual magnitudefor variance. The thing is that when you do PCA on correlations, the "actual magnitude" for you is 1. Don't like it? Then don't do PCA on corrrelations. – ttnphns Jun 11 '14 at 19:07Interpretation of all variables contributing equally to the PCA concept of "total variance."is ornate and somewhat obscured wording of "variables are taken as having equal variance" (1 or another magnitude - no difference). – ttnphns Jun 11 '14 at 19:27