2

I am running a multiple regression with a continuous DV and a mix of dichotomous and continuous IVs (but mostly dichotomous). This is the ZRESID vs ZPRED scatterplot, and I think there is heteroscedasticity but i'm not sure if some of the patterns is due to the binary IVs? Any help would be much appreciated.enter image description here

  • 2
    Standardization here, as often, just takes you one step further away from the data and makes it more difficult to follow what is going on. But from the underlying recipe, residual $=$ observed $-$ predicted, it follows that all data points with the same CONSTANT observed outcome lie on a straight line residual $=$ CONSTANT $-$ predicted, i.e. with negative slope. One such is obvious here as an upper bound to the plot. – Nick Cox Feb 18 '22 at 09:52
  • 1
    I am led to suspect that you have an upper value often observed in practice and that you dealt with left skew by using some ad hoc transformation. If you had say an outcome varying between zero or very small and a maximum, it's best to use regression aimed at fractional or bounded outcomes. I have not used SPSS for many years but I think I've seen that it offers some rather unusual transformations that in my view are hard to defend. In short, part of your puzzlement arises, I guess, from having a bounded outcome variable. (Regarding the outcome as continuous does not solve this at all.) – Nick Cox Feb 18 '22 at 09:53

2 Answers2

2

At present your plot suffers from significant overplotting, which makes it impossible to see how much data is in the blob and lines appearing in the plot. So before you do anything else you should fix your plot by using transparency for the points, so the viewer can judge the volume of data in the different parts of the plot.

Even once you have done that, it can be quite difficult to judge homo/heteroskedasticity from the standard residual plot. It is more usual to construct a scale-location plot of the residuals for this purpose, since it is easier to see changes in variance by looking at the root-absolute residuals (see this related question).

Ben
  • 124,856
2

"Heteroskedasticity" is the wrong question here, because heteroskedasticity is associated with standard (normally-distributed) linear regression. If that's what you're using, that's just the wrong model.

I think what you're looking for is Tobit regression, but I couldn't tell you without knowing what your dependent variable is.

  • I think you mean dependent variable. (One of my desired superpowers would be to oblige people (nicely) never to use the terms dependent variable or independent variable.) – Nick Cox Aug 26 '23 at 10:35
  • Thank you, I've corrected this. – Closed Limelike Curves Aug 26 '23 at 15:27
  • @NickCox I admit I use dependent and independent variable all the time. What do you suggest instead? – Peter Flom Aug 26 '23 at 15:53
  • 1
    Response or outcome; predictor or covariate or explanatory variable. Many other terms too. (NB some use predictor differently: in regression terms with outcome $y$ and model fit $Xb$ for some a columns of $X$ are predictors and for others it is the linear combination $Xb$.) In essence, the objection to indep=endent and dependent is in my view three-fold (1) the terms are already overloaded (2) many get the terms confused (3) there are more evocative terms available. – Nick Cox Aug 26 '23 at 16:12