Consider the linear regression model,
$$ \boldsymbol{y}=\boldsymbol{X\beta}+\boldsymbol{\epsilon}, $$
where $\boldsymbol{y}$ is an $n$-vector of responses, $\boldsymbol{X}$ is an $n\times p$ matrix of covariates (here treated as nonstochastic and assumed to contain a column of ones), $\boldsymbol{\beta}$ is a $p$-vector of unknown parameters, and $\boldsymbol{\epsilon}$ is an $n$-vector of random errors. Take as given the classical linear model assumptions, namely that $\boldsymbol{X}$ is full rank and ${\boldsymbol{\epsilon}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{I})}$.
Let ${\hat{\boldsymbol{\beta}}=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{y}}$ denote the ordinary least squares (OLS) estimator of $\boldsymbol{\beta}$. Furthermore, let ${\hat{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}}=\boldsymbol{H y}}$ be the OLS fitted response vector, where ${\boldsymbol{H}=\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'}$, and let ${\boldsymbol{e}=\boldsymbol{y}-\hat{\boldsymbol{y}}}$ be the OLS residual vector. Note that it can easily be shown that ${\boldsymbol{e}=\boldsymbol{M y}=\boldsymbol{M\epsilon}}$, where ${\boldsymbol{M}=\boldsymbol{I}-\boldsymbol{H}}$ is the $n\times n$ annihilator matrix. Finally, let $\circ$ denote the Hadamard (elementwise) product, so that ${\boldsymbol{e}\circ\boldsymbol{e}}$ is the $n$-vector of squared OLS residuals.
My question is, given the above, is it possible to derive the joint probability distribution of ${\boldsymbol{e}\circ\boldsymbol{e}}$?
The following is my own progress toward answering this question:
- Using moment-generating function technique, it is easy to show that ${\boldsymbol{e}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{M})}$. Note, however, that because $\boldsymbol{M}$ is a singular matrix ($\boldsymbol{M}^{-1}$ does not exist), this is a degenerate multivariate normal distribution and the joint probability density function therefore does not exist.
- Using the relationship between the standard normal distribution and chi-square distribution (the chi-square being a special case of the Gamma distribution), together with moment-generating function technique, I have been able to show that the marginal distributions of the squared OLS residuals are ${e_i^2 \sim \mathrm{Gamma}\left(\alpha=\dfrac{1}{2},\beta=\dfrac{1}{2\sigma^2 m_{ii}}\right)}$, ${i=1,2,\ldots,n}$, where ${m_{ii}}$ is the $i$th diagonal element of $\boldsymbol{M}$. This follows the shape-rate parametrisation of the Gamma distribution, i.e. where a random variable ${U\sim\mathrm{Gamma}(\alpha,\beta)}$ has probability density function ${f_U(u)=\dfrac{\beta^\alpha}{\Gamma(\alpha)}u^{\alpha-1}e^{-\beta u}}$ for ${u>0}$.
- Using first principles, I have been able to show that the variance-covariance matrix of the OLS squared residual vector is ${\mathrm{Cov}(\boldsymbol{e}\circ\boldsymbol{e})=\mathrm{E}\left[\left(\boldsymbol{e}\circ\boldsymbol{e}-\sigma^2\mathrm{diag}(\boldsymbol{M})\right)\left(\boldsymbol{e}\circ\boldsymbol{e}-\sigma^2\mathrm{diag}(\boldsymbol{M})\right)'\right]=2\sigma^4(\boldsymbol{M}\circ\boldsymbol{M})}$ (this only holds under the full classical assumptions, including normality). Elementwise, ${\mathrm{Cov}(e_i^2,e_j^2)=2\sigma^4 m_{ij}^2}$, ${i,j\in\left\{1,2,\ldots,n\right\}}$. Since ${(\boldsymbol{M}\circ\boldsymbol{M})}$ is not, in general, singular, it appears that the joint probability density function of ${\boldsymbol{e}\circ\boldsymbol{e}}$ should exist, i.e. the distribution is not degenerate.
However, I cannot seem to progress further toward arriving at the joint distribution of ${\boldsymbol{e}\circ\boldsymbol{e}}$. It seems possible that it is a multivariate Gamma distribution, but there are a number of different ways of constructing a multivariate Gamma distribution, none of which seem conducive to the present case where the individual random variables have the same shape parameter, different rate parameters, and are positively correlated.
It had been suggested to me that one could apply the transformation technique to the known joint distribution of $\boldsymbol{e}$ using the quadratic transformation ${\boldsymbol{g}(\boldsymbol{e})=\boldsymbol{e}\circ\boldsymbol{e}}$, but this seems technically invalid since it would require using the joint probability density function of $\boldsymbol{e}$, which in this case does not exist. Hence I am stuck.
N.B. I've corrected the covariance expression. The result was correct but the expectation expression was incomplete.
– Thomas Farrar May 16 '22 at 22:59