Bartlett Kernel (Newey West Covariance Matrix)

Question

Referring to Pesaran (2015) the NW covariance matrix is computed according to the following formula:

$$\hat V(\hat\beta)=\frac{1}{T}Q_T^{-1}\hat S_TQ_T^{-1}$$

(Skipping the definition of Q)

$$\hat S_T=\hat\Omega_0+\sum_{j=1}^mw(j,m)(\hat\Omega_j+\hat\Omega_j').$$

(Skipping the definition of $\hat\Omega_j$) Where the Bartlett Kernel is

$$w(j,m)=1-\frac{j}{m+1}.$$

Could someone shed some light on what exactly the Bartlett Kernel "does" for me (and how it "works")? I seem to struggle with understanding the basic theoretical construct that underlies this.

This may be a starting point: https://stats.stackexchange.com/questions/153444/what-is-the-long-run-variance/153543#153543 — Christoph Hanck, Feb 22 '18 at 07:52
It is indeed, thank you. I will check out the reference you are giving in the answer and try to work my way through the topic. I think I do understand the basic concept, but the technical details are kind of a black box to me right now. — shenflow, Feb 22 '18 at 08:07

Thomas Lumley · Answer 1 · 2023-06-11T03:06:31.987

The Bartlett kernel is the proportion of 'windows' of length $m$ covering a point $t$ that also cover $t+j$.

The basic idea in these estimators is actually in the part you left out. Suppose we want to estimate $$V=\frac{1}{T}\mathrm{var}[\sum_t (x_t-\mu)].$$ We know $$E[V] = E[\frac{1}{T} (\sum_t (x_t-\mu))^2]= \frac{1}{T}\sum_{s,t} E[(x_t-\mu)(x_s-\mu)]$$ So what about $$V_0 = \frac{1}{T} (\sum_t (x_t-\mu))^2= \frac{1}{T}\sum_{s,t} (x_t-\mu)(x_s-\mu)$$ That's a natural estimator. Well, actually it's not. It's not an estimator, because it depends on $\mu$, and it has high variance, and if you plug in $\hat\mu=\bar X$ it's a terrible estimator, because it's identically zero.

However, if you just used $$V_1= \frac{1}{T}\sum_{s,t} I_{s=t}(x_t-\hat\mu)(x_s-\hat\mu) $$ it would be a good estimator if the $x_t$ were independent (it's the usual estimator up to $T-1$ vs $T$). There isn't much bias from using $\hat\mu$ instead of $\mu$.

On the other hand, if you used $$V_1= \frac{1}{T}\sum_{s,t} I_{s=t}(x_t-\hat\mu)(x_s-\hat\mu)$$ in the more realistic setting when the series wasn't independent (or at least uncorrelated), you get bias from all the $(s,t)$ pairs that you left out.

So, we'll try $$V_w= \frac{1}{T}\sum_{s,t} w_{st}(x_t-\hat\mu)(x_s-\hat\mu) $$ where the weight is something less dumb than either $w_{s,t}=1$ always or $w_{t,t}=1$ and $w_{s,t}=0$ otherwise

$V_w$ has two sorts of bias. It has bias from $w_{st}$ being too small, so it misses some of the true correlation. I'll call that truncation bias. It also has bias from $w_{st}$ being too big, so that the impact of substituting in $\hat\mu$ is big. I'll call that centering bias; it approximately depends just on the mean of $w_{s,t}$. We want to balance truncation bias and centering bias.

If we assume that the correlation between $x_t$ and $x_s$ tends to go down as $|s-t|$ increases, then truncation bias will be reduced for fixed centering by pushing all the weight $w_{s,t}$ towards small $|s-t|$. That is, you want $w_{s,t}=1$ for small $|s-t|$ and then a pretty quick switch to $w_{st}=0$ for larger $|s-t|$. That's also good from a mean squared error viewpoint.

Unfortunately, the estimator where $w_{st}=I_{|s-t|<m}$ need not give non-negative estimated variances, which is inconvenient. If you want non-negative estimated variances, you need $w_{st}$ to decrease more smoothly with $s-t$.

One way to make sure the estimator is non-negative is to write it as a sum of squared things. You can take $$V_{\text{box}}=\frac{1}{T} \sum_t \frac{1}{2m+1}\left(\sum_{j=-m}^m x_{t+j} \right)^2$$

That's a quadratic in the $x_t$, so it's of the form $$V_w= \frac{1}{T}\sum_{s,t} w_{st}E[(x_t-\hat\mu)(x_s-\hat\mu)] $$ and it's pretty clear that $w_{st}$ is decreasing with $|s-t|$ -- in fact,it's zero for $|s-t|>m$. If you did the algebra, which I'm not going to do, you'd find that the weights $w_{st}$ decrease linearly in $|s-t|$ until the reach zero. It's the Bartlett kernel!

So, the Bartlett kernel is a set of weights that gives pretty good variance estimation (consistent if $m$ increases at the right rate) and guarantees you don't get embarrassed by a negative variance. The cost is a little more bias and variability than the step-function kernels.

It's also possible to show that you can't do better; you can't get the truncation bias lower for a given average weight without risking negative variances.

This was all for $x_t-\mu$ but it generalises to everything else by the usual Taylor series arguments.

Bartlett Kernel (Newey West Covariance Matrix)

1 Answers1