0

I have a problem with a multiple regression I performed:

  • model without constant term;
  • one dependent continuous variable;
  • first set of dummies: derived from 2 continuous variables, I used the median value of them as a threshold to derive two binary variables; from these two binaries, I derived 4 dummies, one for each combination (10, 01, 00, 11);
  • second set of dummies: 3 dummies derived from one categorical variable;
  • two continuous variables.

This model has a r-squared value of 98% (and similar adjusted r squared): I think it is too high, but I don't know how to interpret it correctly and assess its eventual validity; I know that r squared tend to increase with the number of explanatory variables, but I don't know if the number of dummies has an influence in its value and validity as an indicator of a good regression. Moreover, this model present high VIF values, indicating collinearity: are these measures still valid or not?

I have to say I have also tested the model with constant term (and $k-1$ and $n-1$ dummies), which has a very low r squared (around 10%) but no collinearity problems: I would use this model if only I could separate the effect of the two reference dummies on the constant term (and I don't know how to do it).

chl
  • 53,725
MatBi
  • 51

0 Answers0