I am trying to create a classification model with independent variables IV1, IV2 and IV3 and dependent variable DV (DV ~ IV1 + IV2 + IV3).
Now the problem that I am facing is that IV2 exists only when IV1 takes a certain value. For example IV1 may be whether a person owns a house or not. IV2 would then be the size of the house in square metres if IV1 is true. If not, it is not applicable.
The current approach that I am using to tackle this problem is to let IV2 to be always 0 whenever IV1 is false. I find that this approach is not satisfactory as it essentially turns IV2 into a mixture of continuous and discrete variable and introduces too much bias regardless of what statistical model I use.
Suppose DV represents marriage status. If IV1 is true (i.e. the person owns a house), then the chances of that person being married increases with IV2 (size of house). So if IV2 is near 0 and IV1 is true, it is extremely unlikely that the person is married. However, if IV1 is false (the person doesn't own a house), then there is still a good chance that the person might be married. But because I set IV2 to be 0, my model keeps predicting that such a person isn't married.
So my question is, how can I better handle such a problem where a continuous variable exists only when a discrete variable takes a certain value?
Thanks. (and the house-marriage analogy is fictional of course!)