Let’s begin with simple visualization:
ID A1 A2 A3 A4 SUM A1/SUM A2/SUM A3/SUM A4/SUM
1 0 1 1 0 2 0 0.5 0.5 0
2 1 1 1 1 4 0.25 0.25 0.25 0.25
3 0 0 1 0 1 0 0 1 0
4 1 1 1 0 3 0.33 0.33 0.33 0
I’ve got a set of ordinal binary data, where 1 means that respondent used to work in some area, 0 – that he/she didn’t. There are 35 such variables, and I have to reduce this amount, by finding the most similar cases (then I will use this to count mean from some likert-scale indicators in groups and difference between individual indicator and mean indicator in area, to use it as independent variable in logistic regression).
Can I convert this binary data to numerical data, by dividing each “1” response by overall sum of “1” responses (this will indicate “how much” a person used to work in some area) and then use such variables in factor analysis/principal component analysis to find most similar groups?
There are 35 such variables, and I have to reduce this amount, by finding the most similar cases;and then use such variables in factor analysis/principal component analysis to find most similar groupsI could not understand your wishes. Do by cases you mean respondents (I expect so)? Then probably cluster analysis is what you need. Or are you going to use PCA of the rows (Q-mode PCA), not columns? What are your reasons then? – ttnphns Aug 29 '16 at 01:53