1

I am doing linear regression and predicting y using constant,x1,x2,x,3,x4,....x10 (11 x variables) and I am getting R^2 of 88.7% and R^2 adjusted of 85.1% I noticed that all x variables including the intercept have very high P values, indicating that they are not significant. Therefore I decide to build a model like below

predict y using sum(x5...x10)(summing 6 variables to get only 1 variable) and now I am getting R^2 of 97.5% and R^2 adjusted of 97.5%. I am not fitting an intercept in this case because value of R^2 and R^2 adjusted is going down if I use the intercept and also P value of intercept is high indicating that it is not significant

In such case if I average absolute residuals for each model then the second model should have lower average right?

  • squared residuals, but yes probably the absolute ones too – John Madden Aug 29 '22 at 20:52
  • 1
    oh wait, are you not including an intercept in the second model? – John Madden Aug 29 '22 at 20:53
  • 1
    I am not including an intercept in the second model – user2543622 Aug 29 '22 at 21:00
  • 1
    That will mess with the $R^2$ values. Can you help us understand why you're not including an intercept? This should almost always be done. – John Madden Aug 29 '22 at 21:43
  • what if after adding intercept the p value of value of intercept is very high, indicating that it is not significant? I also updated the question – user2543622 Aug 29 '22 at 22:32
  • 1
    If you omit a parameter that was insignificant when you build a new model, you are doing some kind of stepwise regression and thus distorting later inferences (including estimates of performance like $R^2_{adj}$) unless great care is taken. // R uses a different equation for $R^2$ when the intercept is omitted than when the intercept is included. I am. It sure which software package you’re using, but even if it is not R, it might also exhibit this behavior. – Dave Aug 29 '22 at 22:45
  • 2
    If you include an intercept in both models--and you should--then it is mathematically guaranteed that the $R^2$ cannot increase when going from the complicated to the simpler model. The adjusted $R^2$ could increase. That would be an indication that the simplification might be worthwhile, but there are better ways to assess that. – whuber Aug 29 '22 at 22:49
  • @user2543622 Oh yeah you should keep the intercept in no matter what its p-value is :) – John Madden Aug 30 '22 at 02:20
  • so if i keep the intercept then if the second model gets better R square then average absolute residuals for the second model should be lower, right? – user2543622 Aug 30 '22 at 02:25
  • 1
    @user2543622 Not necessarily: Model 1 could beat Model 2 on $R^2$ yet lose to Model 2 on another metric like mean absolute deviation. – Dave Aug 30 '22 at 02:45

0 Answers0