1

I have a count data, and most of my explanatory variables are binary (they represent different types of support given to a group). Three variables seem to be correlated with each other. I don't want to drop any of them as the categories are different and essential. So, I thought maybe I could create another variable combining those variables such as creating a dummy (1), where at least one of the variables is (1) or creating a variable that represents the sum of the values. However, I am not sure about alternatives or the right way to handle this. Could anyone please help me with that?

Nickie
  • 11

1 Answers1

2

There are several possibilities, which you choose depends on how much data you have, how many of the dichotomous IVs you have, and their pattern.

If you have a relatively large amount of data and a relatively small number of variables, you can make one variable that has all the originals. E.g. if you had three dichotomous variables, you would have a new one that was

  • YYY
  • YYN
  • YNY
  • YNN

and so on, with 8 levels.

If you have less data or more variables, you could do a count. So, if you had five kinds of support, this would get a value from 0 to 5.

More complicated is to look at the patterns and make some sensible choices from within the first idea. You could (and maybe should) do this using substantive knowledge and eyeballing the data, or you could perhaps use some kind of cluster analysis. See this thread for some ideas.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383