6

If we have the studentized residuals

$$\frac{y_i - \hat{y_i}}{S \sqrt{1 - \frac{1}{n} - \frac{(x_i - \bar{x})^2}{S_{xx}}}}$$

given the assumptions that $e_i$ are iid $N(0, \sigma^2)$, does the studentized residual have a t-distribution with $n-2$ degrees of freedom?

I tried searching it up, and most sources seem to say yes, but I'm confused with how to prove this. I attempted adding the normal distributions from the regression coefficients, but I can't seem to get it.

sedrick
  • 135

2 Answers2

7

$\newcommand{\e}{\varepsilon}$$\newcommand{\0}{\mathbf 0}$$\newcommand{\E}{\text E}$$\newcommand{\V}{\text{Var}}$I'll start by working with this in matrix form. Let $y = X\beta + \e$ be our model with $\e \sim \mathcal N(\0, \sigma^2 I)$ and $X \in \mathbb R^{n\times p}$ full rank. Then $\hat y = Hy$ where $H = X(X^TX)^{-1}X^T$ is the hat matrix. I'll use $\e$ for the actual unobserved error and $e = y - \hat y$ for the residuals.

Note that $$ \E(e) = \E(y - \hat y) = X\beta - HX\beta = X\beta - X(X^TX)^{-1}X^TX\beta = \0 $$ so $e$ has mean $\0$. Additionally, $$ \V(e) = \V\left[(I - H)y\right] = \sigma^2(I - H). $$ Since $e = (I-H)y$ this means $e$ is a linear transformation of a Gaussian so $e$ is Gaussian too, thus $$ e \sim \mathcal N(\0, \sigma^2 (I-H)) $$ The covariance matrix is positive semidefinite rather than positive definite since this is supported only on the column space of $X$, but when we consider just $e_i$ it'll behave fine.

A $t_k$ distribution is defined as $$ \frac{\mathcal N(0, 1)}{\sqrt{\chi^2_k / k}} $$ with independence between. Define $$ t_i = \frac{e_i}{\hat\sigma_{(i)}\sqrt{1 - h_i}} $$ where $$ \hat\sigma_{(i)}^2 = \frac{1}{n - p - 1}e_{(i)}^Te_{(i)} $$ is the error variance estimate computed for the model with observation $i$ dropped out (so the $n- p - 1$ reflects that $n-1$ was the sample size for this). Doing this means I'm considering the external studentized residuals and I'll actually get a $t$ distribution at the end. See the wikipedia article on studentized residuals for more.


The numerator is $e_i \sim \mathcal N(0, \sigma^2 (1 - h_i))$ where $h_i$ is the $i$th element of $\text{diag}(H)$. This means $$ \frac{e_i}{\sigma\sqrt{1 - h_i}} \sim \mathcal N(0,1). $$

Next, consider $\hat\sigma_{(i)}^2$. We have $$ y_{(i)}^Ty_{(i)} = y_{(i)}^T(I_{n-1} - H_{(i)} + H_{(i)})y_{(i)} = y_{(i)}^T(I-H_{(i)})y_{(i)} + y_{(i)}^T H_{(i)} y _{(i)} $$ with $H_{(i)}$ and $I-H_{(i)}$ being idempotent and $\text{rank}(I-H_{(i)}) = n-p-1$ so by Cochran's theorem $$ y_{(i)}^T(I-H_{(i)})y_{(i)} / \sigma^2 = e_{(i)}^Te_{(i)} / \sigma^2 \sim \chi^2_{n-p-1}. $$ All together this means $$ t_i = \frac{e_i}{\hat\sigma_{(i)}\sqrt{1 - h_i}} = \frac{\frac{e_i}{\sigma\sqrt{1 - h_i}}}{\sqrt{\frac{e_{(i)}^Te_{(i)}}{\sigma^2(n-p-1)}}} $$

is the ratio of a $\mathcal N(0,1)$ distribution to a $\sqrt{\chi^2_{n-p-1} / (n-p-1)}$. And since observation $i$ does not appear in $\hat\sigma_{(i)}$ I get independence. So that means $$ t_i \sim t_{n-p-1}. $$ I would not be guaranteed independence if I didn't use $\hat\sigma_{(i)}$; if you actually want to use the internal studentized residuals that use the same $\hat\sigma^2 = \frac 1{n-p}e^Te$ for every $t_i$ then you'll get a more complicated distribution.

Finally, in your particular case as the wikipedia article says we get $$ 1 - h_i = 1 - \frac 1n - \frac{(x_i - \bar x)^2}{S_{xx}} $$ so we're done.


$\newcommand{\1}{\mathbf 1}$Here's a derivation of that. If we're doing simple linear regression then we'll have $X = (\1 \mid x)$ where $x \in \mathbb R^n$ is the non-intercept univariate predictor; $X$ being full rank is equivalent to $x$ not being constant. This means $$ H = X(X^TX)^{-1}X^T = (\1 \mid x)\left(\begin{array}{cc}n & x^T\1 \\ x^T\1 & x^Tx\end{array}\right)^{-1}{\1^T\choose x^T}. $$ We can use the formula for the explicit inverse of a $2\times 2$ matrix to find $$ (X^TX)^{-1} = \frac{1}{nx^Tx - (x^T\1)^2}\left(\begin{array}{cc}x^Tx & -x^T\1 \\ -x^T\1 & n\end{array}\right) $$ so all together we can do the multiplication to get $$ H = \frac{1}{n x^Tx - (\1^T x)^2}\left(x^Tx\cdot \1\1^T - x^T\1 \cdot (\1 x^T + x \1^T) + n xx^T\right). $$ This means $$ h_i = \frac{x^Tx - 2x^T\1\cdot x_i + nx_i^2}{n x^Tx - (\1^T x)^2}. $$ For the numerator, I can use the fact that $\1^Tx = n \bar x$ to rewrite it as $$ x^Tx - 2nx_i\bar x + n x_i^2 = x^Tx + n(x_i^2 - 2 x_i\bar x + \bar x^2 - \bar x^2) \\ = x^Tx - n\bar x^2 + n(x_i - \bar x)^2 $$ and noting $S_{xx} = x^Tx - n \bar x^2$ I have $$ h_i = \frac{S_{xx} + n(x_i - \bar x)^2}{nS_{xx}} = \frac 1n + \frac{(x_i - \bar x)^2}{S_{xx}}. $$ This means $$ 1 - h_i = 1 - \frac 1n - \frac{(x_i - \bar x)^2}{S_{xx}} $$ as desired.

$\square$

jld
  • 20,228
4

jld's answer (+1) describes the construction of a $t$ random variable, but does not mention why independence is violated, so I figured I would chime in.

The numerator $$ \frac{e_i}{\sigma\sqrt{1 - h_i}} \sim \mathcal N(0,1) $$ and the chi-squared random variable in the denominator $$ e^Te / \sigma^2 \sim \chi^2_{n-k-1} $$ of the internally studentized residuals are not independent because there exist some integrable functions $f$ and $g$ such that $$ E[f(e_1)g(e^Te)] \neq E[f(e_1)]E[g(e^Te)]. $$

Pick $f(x) = x^2$ and $g$ as the identity mapping. Then the left hand side of the display above is

\begin{align*} E[e_i^2 e^Te] &= \sum_{j \neq i } E[ e_j^2] E[e_i^2 ] + E\left[ e_i^2 e_i^2 \right] \\ &= \sigma^4(1-h_{ii})\sum_{j \neq i} (1 - h_{jj}) + E\left[ e_i^4 \right] \\ &= \sigma^4\left[ (1-h_{ii})\sum_{j \neq i} (1 - h_{jj}) + 3(1-h_{ii})^2 \right]\\ &= \sigma^4(1-h_{ii})\left[ \sum_{j } (1 - h_{jj}) + 2(1-h_{ii}) \right] \\ &= \sigma^4(1-h_{ii})\left[ \text{trace}(I - H) + 2(1-h_{ii}) \right] \\ &= \sigma^4(1-h_{ii})\left[ \text{rank}(I - H) + 2(1-h_{ii}) \right] \\ &= \sigma^4(1-h_{ii})\left[(n - k - 1) + 2(1-h_{ii}) \right] , \end{align*} but the right hand side is

$$ E[e_i^2]E[e^Te] = \sigma^4(1 - h_{ii})(n-k-1) $$ because $e^Te \sim \sigma^2 \chi^2_{n-k-1}$.

What's interesting, though, is that they aren't correlated: $$ \text{Cov}\left(\frac{e_i}{\sigma\sqrt{1 - h_i}}, \frac{e^T e}{\sigma^2}\right) \propto E[e_i e^T e] = E\left[ \sum_{j \neq i} e_j^2 e_i + e_i^3 \right] = 0. $$

Taylor
  • 20,630