I'm trying to calculate change in scores on a depression questionnaire - a very simple problem. However, what I care about is not the change in raw score, but rather the change in principal component scores for each subject. My pipeline is as follows:
- Conduct a PCA using the pre-treatment scores for each subject
- Calculate pre-treatment scores for each subject for PC1 through PC4
- Use the loadings for PC1-4 calculated in part 1 to calculate post-treatment scores for each subject for PC1-4
- Compute the difference between pre- and post-treatment scores for each subject
However, because PC scores are scaled, the post-treatment scores are no longer scaled and centered because they are calculated using the PC loadings from the pre-treatment data but the actual data from the post-treatment data. Is this kosher?
A follow-up question would be, is there a better way to calculate change in principal component scores between time points? Could I calculate the factor loadings using all data (pre- and post-treatment) and then calculate pre- and post-treatment scores for PC1-4 that way? Intuitively that seems wrong.
Any suggestions would be much appreciated!
is there a better way? Nobody is to advice. Your decision should mirror what makes sense to you having your tasks at hand. Your current decision - to obtain PC structure from the pre-treatment data and then impose it over to the post-treatment data - sounds not unreasonable. – ttnphns Apr 13 '16 at 08:10newdata points". Many PCA programs will allow you to do it in one action: you enter both datasets in the PCA but indicate the second one as "passive". Alternatively, you can compute PC scores for the "new" dataset points yourself (see how to compute PC scores). Think you, how it is better for you to center/standardize that second dataset points. Usually people center/standardize it by the mean/st.dev. of the first dataset. – ttnphns Apr 13 '16 at 08:20