I'd like to model an interaction term between a continuous variable and categorical variable, while accounting for possible aliasing in the variables. I was wondering what the best way to do this was.
As an example, suppose I have a data set containing damages incurred, car type (sedan, truck, etc), car model, and car age.
Damages incurred ($) Type Model Age
1000 Sedan Hyundai G30S 10
300 Truck Ford F150 3
500 Motorcycle Yamaha F90 2
I'd like to include Age as a predictor, but I have reason to suspect that car age is associated with the car type, i.e. for instance, car ages affect losses very differently for sedans than for trucks. So preferably I'd like to include an interaction term, Type:Age, to account for this.
I also want to include Model, however, once I know the car model I definitely know the car type, so I cannot include Type in the modeling equation due to possible aliasing.
However, I don't want to use Model:Age in the modeling equation, because I have reason to believe that the car model doesn't add much more information than the car type; i.e. car type and age combined have the same effect as car model and age. However, including Model:Age can significantly increase the number of degrees of freedom, since there are so many kinds of car models.
So is there a way to somehow include Age:Type, Model, and Age in the GLM without dealing with significant issues in the model output? Or if not, what would be the best way around it?
Age:Type,Model, andAgeas your variables in the GLM. Just think of it as a model includingTypeas well, but with the coefficients ofTypeconstrained to 0. – Tim Mak May 07 '20 at 07:09Typeis included in the model without collinearity, and you would obtain the same result. For example, Suppose yourModeltake values {1,2,3,4,5,6}, whereType=1 whenModel<= 3, andType=2 otherwise. You can then recode Model asModel2, with values {1,2,3,1,2,3}, such that 1,2,3 indicates "subtypes" withinType. – Tim Mak May 08 '20 at 01:46Model. – Tim Mak May 08 '20 at 01:47Model2, then you need to includeModel2and the interaction termType*Model2to make it equivalent to the original model. I assume all of these to befactorvariables also (in R). – Tim Mak May 08 '20 at 01:53Model2,Type,Type:Model2, andAgein the final model? I'm wondering why the GLM doesn't treat the two 2's in theModel2as the same, even though they are technically in the same column. – platypus17 May 08 '20 at 02:58