4

The formula for the optimal weighting matrix when you perform regression with more instrumental variables than endogenous predictors is the following:

$W_{opt} = (\frac{1}{N}Z'Z)^{-1} $

This tells us that we only have to look at the variance covariance matrix of the instruments,

But doesn't it make more sense to give more weight to the strongest instruments (or in other words, those who correlate better with the endogenous predictors)?

Thanks in advance!

Kasper
  • 3,399

1 Answers1

6

Under the assumption of homoskedasticity, $\Omega = \sigma^2 I$, the covariance matrix of the moment conditions becomes $$S = \frac{1}{n} E(Z'\Omega Z) = \sigma^2 \frac{1}{n} E(Z'Z)$$ and you are right in that we could simply ignore $\sigma^2$ in the weighting matrix and set $W = \left( \frac{1}{n}Z'Z \right)^{-1}$ because the GMM estimator does not change for weighting matrices that differ only by a multiplicative constant. In fact, if the sample is large enough the GMM estimator is consistent for any(!) weighting matrix given that it is positive definite. Hence you can choose arbitrary weighting matrices in theory in order to obtain consistent point estimates. However, this concerns only $\widehat{\beta}$. It also only holds for large enough samples as in small samples the weighting matrix can change the point estimates.

So how can we justify our choice of $W$ in practice?
We choose the weight such that it minimizes the asymptotic variance of the estimator. For this you need $\sigma^2$ which you can obtain as the residuals of the IV estimator $\widehat{\sigma}^2 = \frac{1}{n}\widehat{u}'\widehat{u}$. The variance of our estimator in this case is minimized by setting: $$\widehat{W} = \widehat{S}^{-1} = \left( \widehat{\sigma}^2 \frac{1}{n}Z'Z \right)^{-1}$$

So in this case the low-variance moments (i.e. instruments that are less correlated with the endogenous variable) receive a smaller weight than the high-variance moments. This way we construct a more efficient estimator. Here comes your question into play: the relative strength of the instruments matters for the variance but not for the point estimates.

Andy
  • 19,098
  • 1
    Hey Andy, thanks for your answer. Unfortunately, after some heavy thinking, I still don't get it. Is it possible to say in three words how the weight of an instrument that is not correlated at all with the endogenous predictor gets a weight close to zero? – Kasper Mar 11 '14 at 11:43
  • I should have been more precise. In small samples the choice of the weighting matrix matters because it influences the point estimates. In large samples the GMM estimator remains consistent for any choice of $W$. In this case you choose the weight such that it minimizes the asymptotic variance. These notes explain it very nicely (https://www.google.co.uk/url?sa=t&source=web&rct=j&ei=5C0fU4b4CJD07AaZl4C4CA&url=http://www.soderbom.net/lec2n_final.pdf&cd=1&ved=0CCwQFjAA&usg=AFQjCNENt4_YWemP_bsFY8AS1qGNVA8xVg) p. 21 onwards, I just found them and I'll update my answer this evening to clarify it. – Andy Mar 11 '14 at 15:51
  • I shortened the answer and (hopefully) got rid of unnecessary technicalities. The points about why the choice of the weights doesn't matter for point estimation in large samples and the motivation for giving more weight to stronger instruments in the variance estimation should be clearer now. Let me know if something requires further explanation and if be happy to edit the post again. – Andy Mar 11 '14 at 21:33
  • I guess you're still not convinced? – Andy Mar 13 '14 at 14:04
  • 1
    Hey Andy I am still doubting over everything, but the pieces are coming together hopefully the next days. Keep you posted! – Kasper Mar 14 '14 at 11:42
  • What is confusing me is the optimal weighting matrix if you have one instrument for each endogenous variable and heteroscedasticity. Is the following correct: each weighting matrix will give you the same estimates, as the system is identified. But you have to correct the variance due to heteroscedasticity (with a white correction for example). In case you have an over identified system, you should not only correct the variance, but also calculate a new weighting matrix and calculate the estimates again. The new estimates should also have a smaller variance. Thx for your help! – Kasper Mar 15 '14 at 16:40
  • I think I get it now, thanks a lot for your time and answers!! – Kasper Mar 17 '14 at 10:49
  • Hey Andy, only now I truly get it. I did not get that you start with the covariance of the sample moments, I I read over it. Now I do. You start with the variance of the sample moments and in case of homoscedasticity you end up with the covariance matrix of the instruments. Anyway thanks again! – Kasper Jun 19 '14 at 08:04
  • That's right Kasper! Sorry it didn't come out clearly enough in the answer but I'm happy that it was eventually useful – Andy Jun 19 '14 at 09:29