Using PC1 in subsequent analysis when all loadings are close to $+1$

Question

I did a PCA on some data as a data reduction technique. I had 8 original items. The results showed a single component accounted for most of the variance in the data, with all items loading onto a single component with loadings all $>.8$.

What I would like to do now is to use this component as an outcome in a regression analysis. My confusion is over what to use as the variable itself. Do I extract the component somehow (akin to factor scores in a CFA)? or do I just go back to raw items and create an index (by averaging raw items, for example).

If you're going to "average raw items," it would seem that you have made no use whatsoever of the PCA results. So what were you hoping to accomplish with PCA? — whuber, Mar 16 '16 at 14:37
Well, great question. Here is my thinking:
This is an outcome variable for a multivariate regression model. We have 5 items. Before running the analyses, we wanted to see the underlying dimensionality of the data as a data reduction technique. What we found was that one component fit the data. All loadings were high.

Since I found one component, I"m wondering what the added benefit is of using the component - which I would define as the linear combination of the variable * their loading - versus the raw averaged scores as the measured outcome? Does that question make sense? — user1638567, Mar 16 '16 at 15:20
Do I extract the component somehow (akin to factor scores in a CFA) Why do you mention Confirmatory FA at all, I wonder. The immediate idea is to compute component scores of PC1. (If you find it make sense in the subsequent regression you're speaking about.) Averaging the highly loaded items is possible, too. Think of potential differences: see, see. — ttnphns, Mar 17 '16 at 20:36

score 2 · Answer 1 · edited Mar 18 '16 at 02:07

2

People often do this. They carry out a PCA (although they often call it factor analysis) and then they use it to justify replacing the calculated weights with ones chosen from $\{-1, 0, 1\}$. Although this seems at first glance a strange thing to do there is a rather old paper

@ARTICLE{wainer76,
  author = {Wainer, H},
  year = 1976,
  title = {Estimating coefficients in linear models: it don't make no
          nevermind},
  journal = {Psychological Bulletin},
  volume = 83,
  pages = {213--217},
  keywords = {glm, regression}
}

which suggests that in the field of multiple regression replacing the weights by integers gives a model which performs almost as well but may generalise better since it is not so reliant on chance features of the data. I do not know whether anyone has tried to replicate this in the field of PCA.

edited Mar 18 '16 at 02:07

ttnphns

57,480
49
284
501

answered Mar 17 '16 at 16:34

mdewey

17,806

I don't think it is nice to post (only) a link in the form of code in a particular language (R?) not everybody knows. You could instead delineate the arguments of an article themselves, in your answer. In particular, what is "weights" and how they are related to the component/factor loadings. – ttnphns Mar 17 '16 at 20:43
Weights = coefficients. If you can point me to the link or the computer code in my answer I will expand on it. – mdewey Mar 17 '16 at 21:41
Oops, I may have been wrong with "code". But what a strange style of specification of a reference. Never seen it before. Where is it used? – ttnphns Mar 18 '16 at 01:57
It is the format of BibTex, see https://en.wikipedia.org/wiki/BibTeX – mdewey Mar 18 '16 at 09:51

Using PC1 in subsequent analysis when all loadings are close to $+1$

1 Answers1