Factor analysis for ordinal data converted from binary ordinal data

Question

Let’s begin with simple visualization:

ID  A1  A2  A3  A4  SUM A1/SUM  A2/SUM  A3/SUM  A4/SUM
1   0   1   1   0   2   0       0.5     0.5     0
2   1   1   1   1   4   0.25    0.25    0.25    0.25
3   0   0   1   0   1   0       0       1       0
4   1   1   1   0   3   0.33    0.33    0.33    0

I’ve got a set of ordinal binary data, where 1 means that respondent used to work in some area, 0 – that he/she didn’t. There are 35 such variables, and I have to reduce this amount, by finding the most similar cases (then I will use this to count mean from some likert-scale indicators in groups and difference between individual indicator and mean indicator in area, to use it as independent variable in logistic regression).

Can I convert this binary data to numerical data, by dividing each “1” response by overall sum of “1” responses (this will indicate “how much” a person used to work in some area) and then use such variables in factor analysis/principal component analysis to find most similar groups?

There are 35 such variables, and I have to reduce this amount, by finding the most similar cases; and then use such variables in factor analysis/principal component analysis to find most similar groups I could not understand your wishes. Do by cases you mean respondents (I expect so)? Then probably cluster analysis is what you need. Or are you going to use PCA of the rows (Q-mode PCA), not columns? What are your reasons then? — ttnphns, Aug 29 '16 at 01:53

T.E.G. · Answer 1 · 2023-12-09T20:49:08.043

I've waited for other, more experienced users to reply, but I'll try to do my best. I would like to start with the second part of the question, about using factor/principal component analysis in this context.

I recommend to read the first answer here on "ordinal binary data". So, I simply prefer to take those variables as binary variables. Then, the question is whether it is possible to use binary variables in factor analysis/principal component analysis or not. Here is an answer to a similar question. Two ways are suggested for FA in the answer: FA with tetrachoric correlations or Latent Trait Analysis instead of FA. Using tetrachoric correlations is also suggested here (for CFA). Use of binary data in PCA is more permissible, but here you can find a mention on alternatives. You also need to decide which one to use as the interpretation of FA and PCA will be different.

More information would be necessary to answer the first part of the question. The binary variables tell us whether a respondent used to work in some area or not. They don't tell anything about how much time each respondent spent in each one of those areas. So, based on the information you've provided, I don't think dividing each “1” response by overall sum of “1” responses will indicate “how much” a person used to work in some area.

Factor analysis for ordinal data converted from binary ordinal data

1 Answers1