4

I am trying to do a multiple linear regression in R but am having some problems. I have a set up where I am trying to develop a multiple linear regression model for one variable (y) using six other variables ($x_{1},...,x_{6}$), all of which are correlated to some degree.

Based on my understanding of the data I ran a multiple linear regression for y using $x_3$ and $x_5$. Here is an updated residual plot based on feedback:

$$y~x_3+x5+x_3*x_5$$

IMG

How can I fix this? Would feasible GLS work? Or have I selected a bad combination of independent variables?

By the way I have limited knowledge of regression and have never done a weighted regression before so this is new to me.

Thanks for your advice!

Edit: Here are the additional plots

IMG

IMG

IMG

Here are the residual plots for $x_1...x_6$ too:

IMG

Hope this helps!

  • y is measuring benefits paid while x3 is measuring weeks claimed and x5 is measuring average weekly benefit. I should also mention there is a strong relationship between x3 (weeks claimed) and x4 (weeks compensated) and that when I run a regression with x4 and x5 I get an error due to an "essentially perfect fit". I will add a plot now. Thanks for your feedback! – user135784 Apr 22 '14 at 13:33
  • Ok I have updated my question with the graphs! – user135784 Apr 22 '14 at 13:44
  • 1
    I see curvature and heteroskedasticity in y vs x4 and y vs x5 which suggests an issue, but it also looks like something else is going on. – Glen_b Apr 22 '14 at 13:54
  • Should I post plots of y against the other variables? – user135784 Apr 22 '14 at 13:59
  • If they seem particularly relevant, yes. – Glen_b Apr 22 '14 at 14:13
  • all your image links seem to be broken. – Frames Catherine White Dec 16 '14 at 23:33