How do I handle continuous variables that depends on a another discrete variable?

Question

I am trying to create a classification model with independent variables IV1, IV2 and IV3 and dependent variable DV (DV ~ IV1 + IV2 + IV3).

Now the problem that I am facing is that IV2 exists only when IV1 takes a certain value. For example IV1 may be whether a person owns a house or not. IV2 would then be the size of the house in square metres if IV1 is true. If not, it is not applicable.

The current approach that I am using to tackle this problem is to let IV2 to be always 0 whenever IV1 is false. I find that this approach is not satisfactory as it essentially turns IV2 into a mixture of continuous and discrete variable and introduces too much bias regardless of what statistical model I use.

Suppose DV represents marriage status. If IV1 is true (i.e. the person owns a house), then the chances of that person being married increases with IV2 (size of house). So if IV2 is near 0 and IV1 is true, it is extremely unlikely that the person is married. However, if IV1 is false (the person doesn't own a house), then there is still a good chance that the person might be married. But because I set IV2 to be 0, my model keeps predicting that such a person isn't married.

So my question is, how can I better handle such a problem where a continuous variable exists only when a discrete variable takes a certain value?

Thanks. (and the house-marriage analogy is fictional of course!)

How have you determined that such an approach introduces bias? Shouldn't make any difference whether you're setting IV2 to zero or 999 or anything else - it'll be compensated for in the coefficient estimate for IV1. See here — Scortchi - Reinstate Monica, Apr 08 '15 at 16:08

How do I handle continuous variables that depends on a another discrete variable?

0 Answers0