2

Consider the linear regression model $$ Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_k x_{ik} + \epsilon_i $$ or equivalently in matrix norm $$ \mathbf{Y} = \beta \mathbf{X} + \epsilon$$ where $\mathbb{E}[\epsilon|X] = 0$, $\mathbf{X}$ is $n \times (k + 1)$ matrix. It can be shown by the Central Limit Theorem that the OLS estimates satisfy $$ \sqrt{n}(\hat{\beta} - \beta) \to^d \mathcal{N}(0, \Sigma) $$ where $\Sigma = \mathbb{E}[XX']^{-1} \mathbb{E}[XX'\epsilon^2] \mathbb{E}[XX']^{-1}$ is a $(k+1) \times (k+1)$ matrix.


Suppose we were interested in inference only on $\beta_1$. Is there a way to analytically derive the asymptotic variance of only $\hat{\beta}_1$? More precisely, the expression for $\sigma^2_{\beta_1}$ that satisfies $$ \sqrt{n}(\hat{\beta}_1 - \beta_1) \to^d \mathcal{N}(0, \sigma^2_{\beta_1}) $$ I think it should be the $(2,2)$ entry of $\Sigma$, though I'm not sure how to separate it out analytically given the matrix inverses. Computationally, we can just take the relevant entry $\hat{\Sigma}_{2,2}$ from the full estimated variance matrix but this would be very inefficient (especially if $k$ large) since we only need a single entry. I was trying some ideas with the Frisch-Waugh-Lovell Theorem but not quite getting anywhere.

Any ideas?

Adam
  • 396
  • Unless you are assuming $X$ is stochastic, your expression for $\Sigma$ is unnecessarily complex, and in any case you have the transposes wrong; $\sigma^2_{\epsilon}(X'X)^{-1}$ is correct. – jbowman Jan 21 '23 at 16:50
  • 1
    @jbowman, I agree about the transposes, but as to the complexity, the result should be the heteroskedasticity-robust one (and hence not so much related to whether or not regressors are stochastic), and since heteroskedasticity is quite pervasive in applied work I would not call it unnecessary to consider this expression. – Christoph Hanck Jan 21 '23 at 17:40
  • @Adam, I would try it via partitioned inverses like referenced e.g. here: https://stats.stackexchange.com/questions/258461/proof-that-f-statistic-follows-f-distribution/258476#258476 Not clear if you will obtain a "clean" expression – Christoph Hanck Jan 21 '23 at 17:42
  • @ChristophHanck Thanks! I'll take a look at that - maybe it simplifies. – Adam Jan 22 '23 at 04:40
  • @jbowman The notation is bit careless in my original post, meant for X in the expression for $\Sigma$ to refer to the random variable $X$, while the $X$ right above is the matrix of observations. Edited post. – Adam Jan 22 '23 at 04:42
  • 1
    I cannot see that there's any content to the question, so perhaps I'm misinterpreting it. Doesn't the asymptotic convergence in distribution to a multivariate Normal already tell you what the asymptotic convergence of any one of the estimates is? – whuber Feb 17 '23 at 22:59

1 Answers1

3

We may "need a single entry" only, but all the sample will participate in computing it. Write your model as $$Y_i = \beta_1 x_{i1} + \beta_0 + + \beta_2 x_{i2} + \dots + \beta_k x_{ik} + \epsilon_i$$ and partition the $n \times k$ regressore matrix as $$\mathbf X = \left [\mathbf x_1\quad Z \right],$$ where $Z$ contains all other regressors including the constant. Let $D = {\rm diag} \{\hat \epsilon^2_i\}$, a $n \times n$ diagonal matrix. Then, in practice, $$\widehat \Sigma = n\left [\begin{matrix} \mathbf x_1'\mathbf x_1 & \mathbf x_1'Z \\ Z'\mathbf x_1 & Z'Z \end{matrix}\right]^{-1}\left [\begin{matrix} \mathbf x_1'D\mathbf x_1 & \mathbf x_1'DZ \\ Z'D\mathbf x_1 & Z'DZ \end{matrix}\right]\left [\begin{matrix} \mathbf x_1'\mathbf x_1 & \mathbf x_1'Z \\ Z'\mathbf x_1 & Z'Z \end{matrix}\right]^{-1}.$$

The upper left element will be $1 \times 1$ and it is what you want.

Apply the most convenient matrix blockwise inversion formula, and obtain the final upper left element of $\widehat \Sigma$. Determine whether computing only it results in improvements as regards computational efficiency.

NOTE 1: You need to correct the expression for $\Sigma$ in your post (the expected value should be under the inverse sign).

NOTE 2: The $n$ in front of the expression for $\widehat \Sigma$ is because we compute the variance of $\sqrt{n}(\hat{\beta} - \beta)$. If we want the approximation for the finite sample variance of $\hat \beta$, we ignore it.