Homoscedastic and heteroscedastic data and regression models

Question

How to understand the homoscedasticity and heteroscedasticity in context of regression models?
Is there a way to check these properties in R?

One place to start is here. Adding more search terms may help narrow the results. — Glen_b, Apr 11 '14 at 15:35
@Glen_b Including the terms "Breusch" and "Pagan" identifies several good answers. — whuber, Apr 11 '14 at 15:39
@whuber That would be a way to conduct a formal hypothesis test for *one particular form* of heteroskedasticity... but that formal test is not necessarily a good way to (a) understand heteroskedasticity, nor (b) do checking of regression heteroskedasticity assumptions in general - which as I read it are the two parts of this question. I think if one is in the part of model formulation where basic assumptions are a concern, probably some visual diagnostics would be the place to start. I had planned to come back and add an answer to that effect here, but now I can't. — Glen_b, Apr 11 '14 at 23:26
@whuber In fact I don't think that post is a suitable duplicate. There might be some near duplicates, but to my mind, that isn't one. — Glen_b, Apr 11 '14 at 23:29
@Glen_b Thank you for pointing out the differences: I therefore voted to re-open this question. — whuber, Apr 13 '14 at 15:15
possible duplicate of What does having "constant variance" in a linear regression model mean? — gung - Reinstate Monica, Apr 13 '14 at 21:01
Both bptest() and ncvTest() perform the Breusch-Pagan test against heteroskedasticity. — adonies, May 11 '18 at 01:42
Try arch.test package in R, which implements Engle's ARCH (Autoregressive conditional heteroskedasticity) test. — Aksakal, Apr 13 '14 at 16:06

Glen_b · Accepted Answer · 2018-05-11T02:59:36.883

In R when you fit a regression or glm (though GLMs are themselves typically heteroskedastic), you can check the model's variance assumption by plotting the model fit.

That is, when you fit the model you normally put it into a variable from which you can then call summary on it to get the usual regression table for the coefficients. If you plot the same variable you get some diagnostic plots.

For example, consider:

carmdl <- lm(dist~speed,cars)
plot(carmdl)

The third of the default plots that it produces is the scale-location plot:

plot of sqrt of absolute standardized residuals vs fitted, a.k.a. scale-location plot, to identify heteroskedasticity. In this case it shows fairly constant spread, perhaps higher on the right than on the left, indicating slightly higher residual spread at the larger fitted values

[Other common choices for the y-axis in such a plot are the absolute residual and the log of the squared residual.]

That's a basic visual diagnostic of the spread of standardized (for model-variance) residuals against fitted values, which is suitable for seeing if there's variability related to the mean (not already accounted for by the model). If the assumption of homoskedasticity is true, we should see roughly constant spread. In this case the indication of increase with fitted values is fairly mild.

A common form of heteroskedasticity to look for would be where there's an increase in spread against fitted values. That would show as an increasing trend in the plot above. It can also be formally tested by the Breusch-Pagan test (though formal hypothesis tests of model assumptions aren't necessarily the best choice).

There are other forms of heteroskedasticity that are possible, but that's the most common one to check for. For example, if changing spread against a particular predictor was expected, that would suggest plotting the residual spread measure above against that predictor.

Homoscedastic and heteroscedastic data and regression models

1 Answers1

Linked