I currently reading Whitney Newey and Kenneth West's paper "a simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix"
For a multi-regression linear model: \begin{equation} y_t = X_t \beta_t + e_t, \qquad t=1,\cdots, T \end{equation} where $y_t$ and $X_t$ are given, $\beta_t$ is a non-random but unobservable vector, $e_t$ is the noise term.
According to the paper, the noise covariance of $e_t$, $S_T$ can be estimated using sample errors $\hat{e}_t := y_t - X_t \hat{\beta}_t$, such that \begin{equation} \hat{S}_T = \hat{\Omega}_0 + \sum_{j=1}^m w_j\left[\hat{\Omega}_j \hat{\Omega}_j^\top\right], \end{equation} where \begin{equation} \hat{\Omega}_j := \frac{1}{T}\sum_{t=j+1}^T\hat{e}_t\hat{e}_{t-j}^\top. \end{equation} Can anyone explain to me the derivation of $\hat{S}_T$? Why $\hat{S}_T$ can be expressed as sum of the covariance matrices with different lags?
my first guess is that $e_t$ might follow MA(m) process, such that \begin{equation} e_t:= \hat{e}_t + \phi_1\hat{e}_{t-1} + \cdots + \phi_m\hat{e}_{t-m} \end{equation} but traditionally for MA(m) process, we assume $\mathbb{E}\{e_{t} e_{t-j}^\top\} = 0$ for nonzero lag $j$.
If we assume that $\mathbb{E}\{\hat{e}_{t} \hat{e}_{t-j}^\top\} \neq 0$ for all $j$, then $e_t$ more like a AR(m) process, hence the covariance matrix is given by \begin{equation} \mathbb{E}\{e_te_t^\top\} = \begin{bmatrix} 1 & \phi_1 & \cdots & \phi_m\end{bmatrix} \begin{bmatrix} \hat{\Omega}_0 & \hat{\Omega}_1 & \cdots & \hat{\Omega}_m \\ \hat{\Omega}_1 & \hat{\Omega}_0 & \cdots & \hat{\Omega}_{m-1}\\ \vdots & \vdots & \ddots & \vdots \\ \hat{\Omega}_m & \hat{\Omega}_{m-1} & \cdots & \hat{\Omega}_0 \end{bmatrix}\begin{bmatrix} 1 \\ \phi_1 \\ \vdots \\ \phi_m\end{bmatrix} \end{equation}