1

I am trying to conduct a data analysis project, which involves a multivariable regression model with 13 predictor variables. Before having transformed/ altered the data at all, I fitted a rough model using R. Here are the corresponding plots: Residuals vs Fitted Normal Q-Q Scale-Location Residuals vs Leverage

Now, what immediately concerned me was the slight quadratic shape that the residuals vs fitted plot assumed. This strikes me as an issue of heteroskedasticity, but I am unsure how severe it is. Furthermore I know there are several other issues to address with the other summary plots, but my real question is- What is the next step? How can I improve my model given these plots, and given the non-linearity displayed in the first plot? Must I use a transformation? Should I attempt to identify anomalies and remove them? Can anyone offer some insight on this? Help would be much appreciated.

DPJDPJ
  • 133
  • 1
    The only relevant plot is the first one, because the rest are dependent on the first being reasonable. You have an obvious problem of incorrect functional specification. You might try logging $Y$. Or perhaps one of the more important $X$ variables has a nonlinear relation with $Y$; you could look at various ordinary scatterplots with LOESS smooths to check that. If so, perhaps the issue is that one of the $X$ variables needs to enter in an alternative function form (eg, quadratic, log) – BigBendRegion Nov 29 '20 at 16:33
  • 2
    I'd inspect the model using so called partial-residual plots (using the car package and the crPlots function). These plots are frequently used to detect nonlinear associations between the predictors and the outcome. See also here. – COOLSerdash Nov 29 '20 at 16:34

0 Answers0