How does heteroskedasticity affect the validity of R squared and other metrics?

Question

I apologise for the trivial question, but I have got myself confused about how heteroskedasticity affects OLS regression and would be very thankful for your help.

In standard OLS, homoskedasticity is not a requirement of unbiasedness. Hence, under heteroskedasticity, the coefficient estimates will still be unbiased. The standard errors will however be wrong, which makes the t-test invalid.

But what about other metrics like F-test, R squared and adjusted R squared?

I am thinking that if the coefficients are consistent, then the estimate of the regression residuals ($y-\beta_0 - \beta_! *x_1 - \beta_2 *x_2 - ... - \beta_n * x_n = u$) should also be unbiased. But in that case, nothing really changes with R squared or the F-test as these are based on SSR?

However, I know that for a single restriction $F = t^2$, and this would indicate that F should also be inconsistent under heteroskedasticity. How then does all of this go together?

R-squared is a goodness of fit measure. It is not really used for inference. Intuitively, as heteroskedasticity increases, the R-squared of a given model will decrease. This should be fairly clear from the formula. — lmo, Jun 09 '19 at 11:33
"under heteroskedasticity, the coefficient estimates will still be unbiased". Intuitively it follows under this condition that residuals will also be unbiased, and then F-test's p-value (which is similar to R^squared p-value) should result in unbiased estimate. https://stats.stackexchange.com/questions/111602/does-r-squared-have-a-p-value — Alexey Burnakov, Sep 09 '19 at 13:43
The validity of a test is not well defined - the t-test will certainly have power to reject the null even when true, even if variances are unequal. But control of the type 1 error rate should give us pause according to Dave's answer. Simulating data where the standard deviation differs 100-fold from the smallest to the largest observation should still be considered as about as extreme a form of heteroscedasticity as possible. — AdamO, Nov 21 '22 at 21:39
@AlexeyBurnakov how is a residual "unbiased" and how is a p-value consequently "unbiased"? — AdamO, Nov 21 '22 at 21:40

score 1 · Answer 1 · answered Nov 21 '22 at 21:20

F-test

The usual F-stat has the claimed F-distribution under the null hypothesis when the error terms have equal variance (among other conditions). When the error terms have unequal variance, the F-stat no longer has the claimed F-distribution under the null hypothesis. Let's look at a simulation where we see what happens to the p-values when the null is true for equal and unequal error variances.

library(ggplot2)
set.seed(2022)
N <- 100
R <- 1000
x <- seq(1, N, 1)
Ey <- rep(0, length(x))
ps1 <- ps2 <- rep(NA, R)
for (i in 1:R){
e1 <- rnorm(N, 0, x)
  y1 <- Ey + e1
  L1 <- lm(y1 ~ x)
  ps1[i] <- summary(L1)$coef[2, 4]
e2 <- rnorm(N, 0, mean(x))
  y2 <- Ey + e2
  L2 <- lm(y2 ~ x)
  ps2[i] <- summary(L2)$coef[2, 4]
}
d1 <- data.frame(pval = ps1, ECDF = ecdf(ps1)(ps1), Variance = "Unequal")
d2 <- data.frame(pval = ps2, ECDF = ecdf(ps2)(ps2), Variance = "Equal")
d <- rbind(d1, d2)
ggplot(d, aes(x = pval, y = ECDF, col = Variance)) +
  geom_line() +
  geom_point() +
  geom_abline(slope = 1, intercept = 0)

Since the null hypothesis is true, we want the p-values to have a uniform distribution like we see from the blue graph that comes from equal-variance errors. When the error variances are unequal, the p-values are skewed toward being small, as we see from the red graph. This means that the test rejects too often. The severity of this can be debated (keeping in mind that this is just one simulation). For instance, at the $\alpha=0.05$-level, this simulation rejects only $7.5\%$ of the time. You be the judge of how severe that is.

Similar logic applies to t-testing the individual coefficients: when the variances are unequal, the usual t-stats no longer have their claimed t-distributions.

When the null hypothesis is false, we can lose out on some power, as the following simulation shows.

library(ggplot2)
set.seed(2022)
N <- 100
R <- 1000
x <- seq(1, N, 1)
Ey <- 0.25*x
ps1 <- ps2 <- rep(NA, R)
for (i in 1:R){
e1 <- rnorm(N, 0, x)
  y1 <- Ey + e1
  L1 <- lm(y1 ~ x)
  ps1[i] <- summary(L1)$coef[2, 4]
e2 <- rnorm(N, 0, mean(x))
  y2 <- Ey + e2
  L2 <- lm(y2 ~ x)
  ps2[i] <- summary(L2)$coef[2, 4]
}
d1 <- data.frame(pval = ps1, ECDF = ecdf(ps1)(ps1), Variance = "Unequal")
d2 <- data.frame(pval = ps2, ECDF = ecdf(ps2)(ps2), Variance = "Equal")
d <- rbind(d1, d2)
ggplot(d, aes(x = pval, y = ECDF, col = Variance)) +
  geom_line() +
  geom_point() +
  geom_abline(slope = 1, intercept = 0)

The blue graph, which corresponds to equal-variances error terms, is skewed more toward small p-values, indicating greater power to reject. For instance, at the $\alpha=0.05$-level, blue and red have powers of $27.4\%$ and $25.7\%$, respectively. Again, you be the judge of how severe this is (keeping in mind that this is just one simulation).

$R^2$ and Adjusted $R^2$

This is a weird one, because there are many ways to interpret $R^2$.

The square of the correlation between the feature and the outcome in a simple linear regression
The square of the correlation between the predictions and the observations
A measure of how large the error variance in your model is compared to the error variance of a model that predicts $\bar y$ every time (so the variance of all $y$ values pooled together)
The proportion of variance explained by the model

Outside of a simple linear regression, #1 is irrelevant. For #2, this is a reasonable way to think of $R^2$, regardless of heteroskedasticity. In #3, there is no longer a common error variance, so this does not make sense. In #4, this still makes sense.

Depending on why there is heteroskedasticity in the error variances, you might consider if measures of squared deviations (like $R^2$) are appropriate at all. For instance, I would consider it a bigger error to overestimate a restaurant bill by \$100 than the cost of a house.

How does heteroskedasticity affect the validity of R squared and other metrics?

1 Answers1