What is the t-value of a linear regression as variance of residuals approaches $0$?

Question

I was just thinking about a linear regression situation where we assume each point $Y_i \sim \mathcal{N}(\beta x_i+c,\sigma^2)$ and all points are independent, what would the value of the t-statistic be when testing whether $\hat{\beta} = \beta$ as $\sigma \to 0$? I'm assuming we are using the formula $$t = \frac{\hat{\beta} - \beta}{\text{S.E.}(\hat{\beta})}.$$ My confusion is that $\hat{\beta} \to \beta$ and $\text{S.E.}(\hat{\beta}) \to 0$.

if standard error of the regression approach to zero and all OLS assumptions are meet both, numerator and denominator, approach to zero. Now the t-stat should said us that the null hypotesis cannot be rejected ... so I suspect that the nominator tend to zero faster than the denominator ... and t-stat approach to zero. However this is just an euristic reasonong ... formal proof is needed. — markowitz, Jan 21 '24 at 19:41
NB: Your title does not reflect the question. In the question, the error variance shrinks to zero, not the residual variance. The latter is a property of any set of data -- it depends on what the $Y_i$ happen to be and on how $\beta$ and $c$ are estimated -- while the former, as stated, as a parameter in the model, $\sigma^2,$ and governs the data generation process. — whuber, Jan 23 '24 at 18:51

score 8 · Answer 1 · answered Jan 23 '24 at 18:47

This is a little tricky because we need to consider what $\sigma\to 0$ might mean. One possible interpretation is that a set of values $x_i$ of the explanatory variable are fixed, along with the parameters $\beta$ and $c.$ Do we treat the errors as random variables or as also being fixed? Here I analyze the latter case because it is straightforward and instructive, but because it also requires new formulas (because the errors are no longer random variables, which precludes applying standard results expressed in terms of their means and variances).

Let's get the algebra out of the way first.

Plan of analysis

If the errors $\varepsilon_i$ are treated as fixed (so that varying $\sigma$ -- which we allow to be zero or even negative -- merely rescales them), then the $(x,y)$ data can be expressed as a collection of ordered pairs

$$(x_i, y_i) = (x_i,\ \alpha + \beta x_i + \sigma\varepsilon_i),\quad i = 1,2,\ldots, n$$

(where I have written "$\alpha$" for "$c$" to keep the notation consistent.) In this context, the only variable is $\sigma$ and we may consider the t-statistic in the question as a function of $\sigma$ alone, written $t(\sigma).$ The immediate objective is to obtain a formula for $t(\sigma)$ that we can analyze. This will be done first by finding a formula for its numerator and then a formula for its denominator.

Preliminary simplifications

Without any loss of generality we may proceed as always and re-express the explanatory variable by centering it at zero and fixing its variance. It's simplest (using vector notation $x = (x_1,x_2,\cdots, x_n)$ and $\,\cdot\,$ for the vector dot product) to set

$$1 = |x|^2 = x\cdot x = \sum_{i=1}^n x_i^2.$$

The numerator

It is well known -- and extensively documented here on CV -- that the slope estimate is

$$\hat\beta = x\cdot (y - \bar y) = \beta + \sigma x\cdot (\varepsilon - \bar\varepsilon)$$

with $\bar y$ the mean of the $y$ values and $\bar\varepsilon$ the mean of the errors $\varepsilon_i.$

The denominator

The residuals consequently are

$$e_i = y_i - \hat y_i = \sigma(\varepsilon_i - \bar\varepsilon - x\cdot(\varepsilon - \bar \varepsilon)x_i).$$

Writing $z_i = \varepsilon_i - \bar\varepsilon - x\cdot(\varepsilon - \bar\varepsilon)x_i$ for the coefficient of $\sigma,$ the usual (unbiased least squares) estimator of the standard error of $\hat\beta$ (assuming $n\gt 2$) is

$$\widehat{\operatorname{SE}}(\hat\beta)^2 = \frac{1}{n-2}|e|^2 = \frac{|z|^2}{n-2}\,\sigma^2.$$

Analysis of the Question

With that algebra done, consider testing $H_0:\beta = \beta_0$ against the alternative $H_A:\beta \ne \beta_0$ with the usual t test. It has $n-2$ degrees of freedom (which does not change when $\sigma$ is varied) and the t statistic is

$$t(\sigma) = \frac{\hat\beta - \beta_0}{\sqrt{\widehat{\operatorname{SE}}(\hat\beta)^2}} = \sqrt{n-2}\left(\frac{\beta - \beta_0 + \sigma x\cdot(\varepsilon-\bar\varepsilon)}{|z|}\right)\frac{1}{\sigma} = A + \frac{B}{\sigma},$$

with $A$ proportional to $x\cdot(\varepsilon-\bar\varepsilon)$ and $B$ proportional to $\beta-\beta_0.$

Answer

This has three possible behaviors:

For testing any hypothesis where $\beta\ne \beta_0,$ the t statistic is asymptotic to $\sigma^{-1}.$ This guarantees that as $\sigma$ shrinks, the t statistic eventually becomes significant, no matter what level of significance is chosen.
If we happen to postulate the correct value $\beta_0=\beta,$ then $t(\sigma)$ is constant:
- The constant is zero when $A=0;$ that is, the vectors (not random variables!) $x$ and $\varepsilon - \bar\varepsilon$ are orthogonal. The test will never reject $H_0.$
- Otherwise, if $|A|$ is sufficiently great, $H_0$ can be rejected (and it is rejected no matter what value $\sigma$ might have). This occurs because the residuals $\varepsilon_i$ are not truly orthogonal to the $x_i,$ creating an irreducible discrepancy between the data pairs and the ideal linear model.

If errors are considered rvs, and $E[\epsilon|X]=0$, something like your answer "2a" is the relevant? — markowitz, Jan 24 '24 at 08:02
@markowitz When errors are analyzed as RVs, the standard analysis applies and shows the t statistic has a Student t distribution under the null hypothesis and a non-central t distribution under the alternate hypothesis. — whuber, Jan 24 '24 at 16:16
I known, and probably the asker too, but this do not answer to the question. Indeed the asker have the standard analysis in mind I suppose. — markowitz, Jan 24 '24 at 16:36
Moreover if error and explicative variables are non random even the response variable is not random. I such a situation it is not clear to me the role of hypothesis testing. — markowitz, Jan 24 '24 at 16:44
@markowitz "Randomness" is primarily a modeling construct: a stance taken by the analyst to understand and interpret data. Plenty of regressions are performed with good effect on deterministic, non-random data. — whuber, Jan 24 '24 at 17:18
Regression is a concept with discuss meaning. Sometimes explicative variables are considered as not random but I never seen regression and test used with non random errors and, so, without randomness at all. — markowitz, Jan 25 '24 at 09:03
@whuber I would like to understand what you did, and it's a bit hard... sorry. I have a few, probably basic, questions (1) I guess you use symbol e for estimated error terms, and the Greek epsilon for the true (unknown) errors? (2) The mean of the true epsilons does not have to be 0, I guess, unlike the mean of the estimated e terms. Is that correct? (3) the estimated e terms are uncorrelated with x, but the epsilons could be correlated. Is that correct? — BenP, Jan 25 '24 at 15:17
@BenP No need to guess: I name and define both the $\varepsilon_i$ and $e_i.$ (2) is correct and is one of the main complications. Even when we model the errors as iid Normal variables, there is still no chance that their mean is $0;$ whereas by construction the mean of the errors $e_i$ is zero. The only possible meaning of "uncorrelated" in (3) in this formulation is that $(x-\bar x)\cdot(e-\bar e)=0$ (true by construction) and $(x-\bar x)\cdot(\varepsilon-\bar\varepsilon)=0,$ which is not necessarily true (and indeed has zero probability with the iid Normal assumption). — whuber, Jan 25 '24 at 15:21
@whuber Thanks, so these guesses 2 and 3 where okay :-) I will continue reading your nice exposition... so may be more questions follow. Best regards. — BenP, Jan 25 '24 at 15:36

score 0 · Answer 2 · answered Jan 23 '24 at 13:25

If think we have the same problem here: what happens if we test $H_0: \mu=0$ when $\sigma$ approaches zero?

Let's assume that true value of $\mu$ in the population is $1$, and the true value of $\sigma$ is "small", say, $0.1$. Since we know the value of $\sigma$, we can use the $z$-test instead of $t$, but that doesn't change a lot. Also assume we have a sample from this population of, say, 4 cases, with mean value 0.8 (or whatever value close to the true value 1 of $\mu$ in the population, because $\sigma$ is so small). We could now calculate the value of the test statistic $z$ under $H_0$:

$\Large z = \frac{0.8 - 0}{\frac{0.1}{\sqrt4}} = 16$

If we would not know the true value of $\sigma$ we would estimate it by the sample standard deviation, and that would be a small value too, because the true $\sigma$ happens to be so small. So again we end up with a "huge" $t$ value.

No matter how close the true value of $\mu$ under the alternative hypothesis (which we assumed to be 1 above) is to $0$ (the value under $H_0$), the situation will be similar if we choose $\sigma$ arbitrarily small. Each time we will have: the smaller $\sigma$ is, the more extreme (positive/negative) our $t$ value will be, for any true value of $\mu$ under the alternative hypothesis.

It seems me that your arguments are bad focused. The relevant case is where the value under the null hypothesis is exactly the true one. Otherwise indeterminate form do not appear and the result is trivial. — markowitz, Jan 23 '24 at 13:59
Sorry markowitz, did not notice that you meant this.... Then my answer is irrelevant indeed. — BenP, Jan 23 '24 at 14:57