0

I'm working on a model of rent for my employer. To keep it interpretable we're using OLS. I've had great luck on feature engineering so far in terms of increasing R-squared and reducing AIC. However, I just tried including the percentage of bachelor degree holders within a zip code and got results that are giving me pause.

When running a bivariate regression, looking at the correlation and inspecting visually, bachelors degrees have a positive effect on rent. However, within the multivariable model it has a steeply negative effect.

I have to be slightly vague but other variables we are including are those endogenous to the homes themselves (sf, etc.), local home values, prevailing rents within a ten mile radius, the standardized math scores at the closest schools, distance to the closest school and, as mentioned, the percentage of people in the zip code holding a bachelors degree.

When I take out the local single family home values the sign on the bachelors degree effect size changes and it becomes positive.

Can anyone posit some reasons I may be seeing this outcome?

Jack
  • 334
Nye307
  • 11

0 Answers0