Is it possible to derive the joint probability distribution of squared OLS residuals under the classical linear regression assumptions?

Question

Consider the linear regression model,

$$ \boldsymbol{y}=\boldsymbol{X\beta}+\boldsymbol{\epsilon}, $$

where $\boldsymbol{y}$ is an $n$-vector of responses, $\boldsymbol{X}$ is an $n\times p$ matrix of covariates (here treated as nonstochastic and assumed to contain a column of ones), $\boldsymbol{\beta}$ is a $p$-vector of unknown parameters, and $\boldsymbol{\epsilon}$ is an $n$-vector of random errors. Take as given the classical linear model assumptions, namely that $\boldsymbol{X}$ is full rank and ${\boldsymbol{\epsilon}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{I})}$.

Let ${\hat{\boldsymbol{\beta}}=(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{y}}$ denote the ordinary least squares (OLS) estimator of $\boldsymbol{\beta}$. Furthermore, let ${\hat{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}}=\boldsymbol{H y}}$ be the OLS fitted response vector, where ${\boldsymbol{H}=\boldsymbol{X}(\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'}$, and let ${\boldsymbol{e}=\boldsymbol{y}-\hat{\boldsymbol{y}}}$ be the OLS residual vector. Note that it can easily be shown that ${\boldsymbol{e}=\boldsymbol{M y}=\boldsymbol{M\epsilon}}$, where ${\boldsymbol{M}=\boldsymbol{I}-\boldsymbol{H}}$ is the $n\times n$ annihilator matrix. Finally, let $\circ$ denote the Hadamard (elementwise) product, so that ${\boldsymbol{e}\circ\boldsymbol{e}}$ is the $n$-vector of squared OLS residuals.

My question is, given the above, is it possible to derive the joint probability distribution of ${\boldsymbol{e}\circ\boldsymbol{e}}$?

The following is my own progress toward answering this question:

Using moment-generating function technique, it is easy to show that ${\boldsymbol{e}\sim N(\boldsymbol{0},\sigma^2\boldsymbol{M})}$. Note, however, that because $\boldsymbol{M}$ is a singular matrix ($\boldsymbol{M}^{-1}$ does not exist), this is a degenerate multivariate normal distribution and the joint probability density function therefore does not exist.
Using the relationship between the standard normal distribution and chi-square distribution (the chi-square being a special case of the Gamma distribution), together with moment-generating function technique, I have been able to show that the marginal distributions of the squared OLS residuals are ${e_i^2 \sim \mathrm{Gamma}\left(\alpha=\dfrac{1}{2},\beta=\dfrac{1}{2\sigma^2 m_{ii}}\right)}$, ${i=1,2,\ldots,n}$, where ${m_{ii}}$ is the $i$th diagonal element of $\boldsymbol{M}$. This follows the shape-rate parametrisation of the Gamma distribution, i.e. where a random variable ${U\sim\mathrm{Gamma}(\alpha,\beta)}$ has probability density function ${f_U(u)=\dfrac{\beta^\alpha}{\Gamma(\alpha)}u^{\alpha-1}e^{-\beta u}}$ for ${u>0}$.
Using first principles, I have been able to show that the variance-covariance matrix of the OLS squared residual vector is ${\mathrm{Cov}(\boldsymbol{e}\circ\boldsymbol{e})=\mathrm{E}\left[\left(\boldsymbol{e}\circ\boldsymbol{e}-\sigma^2\mathrm{diag}(\boldsymbol{M})\right)\left(\boldsymbol{e}\circ\boldsymbol{e}-\sigma^2\mathrm{diag}(\boldsymbol{M})\right)'\right]=2\sigma^4(\boldsymbol{M}\circ\boldsymbol{M})}$ (this only holds under the full classical assumptions, including normality). Elementwise, ${\mathrm{Cov}(e_i^2,e_j^2)=2\sigma^4 m_{ij}^2}$, ${i,j\in\left\{1,2,\ldots,n\right\}}$. Since ${(\boldsymbol{M}\circ\boldsymbol{M})}$ is not, in general, singular, it appears that the joint probability density function of ${\boldsymbol{e}\circ\boldsymbol{e}}$ should exist, i.e. the distribution is not degenerate.

However, I cannot seem to progress further toward arriving at the joint distribution of ${\boldsymbol{e}\circ\boldsymbol{e}}$. It seems possible that it is a multivariate Gamma distribution, but there are a number of different ways of constructing a multivariate Gamma distribution, none of which seem conducive to the present case where the individual random variables have the same shape parameter, different rate parameters, and are positively correlated.

It had been suggested to me that one could apply the transformation technique to the known joint distribution of $\boldsymbol{e}$ using the quadratic transformation ${\boldsymbol{g}(\boldsymbol{e})=\boldsymbol{e}\circ\boldsymbol{e}}$, but this seems technically invalid since it would require using the joint probability density function of $\boldsymbol{e}$, which in this case does not exist. Hence I am stuck.

It is possible, since quadratic forms of normal distributions are well understood (see Theorem 3.1.2 of Moser's linear models textbook). You'll have to consider the SVD of $I-H$ in order to get a gaussian response with a full rank covariance matrix though. — Ben, May 16 '22 at 22:01
I don't think I believe all these conclusions. In particular, since the distribution of $e$ is degenerate, so necessarily is the distribution of its square. You haven't computed the covariance: you only report computing the raw second moment. — whuber, May 16 '22 at 22:18
Thanks @Ben but I don't see the relevance of that theorem here, since the random vector of squared residuals is not a quadratic form. (The sum of squared residuals is, but that's not what the question is about.) — Thomas Farrar, May 16 '22 at 22:25
@whuber I was asking myself whether the degeneracy of $\boldsymbol{e}$ implies the degeneracy of $\boldsymbol{e}\circ\boldsymbol{e}$. My counterargument to myself was as follows. The degeneracy of the former is due to some elements of $\boldsymbol{e}$ being a deterministic function of other elements. However, this functional relationship is not preserved under squaring, which is not a 1-to-1 transformation. Hence the covariance matrix of the squared residuals is non-singular.
N.B. I've corrected the covariance expression. The result was correct but the expectation expression was incomplete. — Thomas Farrar, May 16 '22 at 22:59
Non-singular covariance does not imply the distribution is nondegenerate! Regardless, you can greatly simplify your question by noting it reduces to an examination of any pair of the $e_i,$ which have a Binormal distribution. — whuber, May 17 '22 at 13:11
I've now realised that any pair $(e_i^2,e_j^2)$ have, up to a scale factor, the Kibble bivariate Gamma distribution (Kibble 1941). Similarly, the full $n$-vector $\boldsymbol{e}\circ\boldsymbol{e}$ has (again, up to a scale difference) the $n$-variate Kibble distribution defined by Krishnamoorthy and Parthasarathy (1951), but it does appear that it is degenerate in this case. Also, there does not seem to be a closed-form expression for the joint PDF of the $n$-variate Kibble distribution in the literature; only for the MGF and characteristic function. — Thomas Farrar, May 17 '22 at 18:41

Thomas Farrar · Answer 1 · 2022-09-14T07:38:38.333

I've managed to come up with an answer to my question as follows.

Bivariate Case

Let ${(U_1,V_1),\ldots,(U_k,V_k)}$ be an independent random sample of size $k$ from a bivariate normal distribution with mean $\boldsymbol{0}$ and covariance matrix ${\boldsymbol{\Sigma}=\begin{bmatrix} \sigma_1^2 & \rho_0 \sigma_1 \sigma_2 \\ \rho_0 \sigma_1 \sigma_2 & \sigma_2^2 \end{bmatrix}}$. Thus, ${\mathrm{Var}(U_i)=\sigma_1^2}$ and ${\mathrm{Var}(V_i)=\sigma_2^2}$ for ${i=1,2,\ldots,k}$ and ${\mathrm{Corr}(U_i,V_j)=\rho_0}$, ${i\ne j}$. Define ${U=\dfrac{1}{2\sigma_1^2}\displaystyle\sum_{i=1}^{k}U_i^2}$ and ${V=\dfrac{1}{2\sigma_2^2}\displaystyle\sum_{i=1}^{k}V_i^2}$. It can easily be shown, using the relationship between the zero-mean normal distribution and Gamma distribution and the scalability property of the Gamma distribution—namely, that if ${X\sim\mathrm{Gamma}\left(\alpha,\beta\right)}$ then ${X/\beta\sim\mathrm{Gamma}\left(\alpha,1\right)}$—that ${U,V\sim\mathrm{Gamma}\left(\alpha=\dfrac{k}{2},\beta=1\right)}$.

Kibble (1941) showed that the joint probability density function of $U$ and $V$ is given by

\begin{align} p(u,v;\alpha,\rho_0) &= \dfrac{\rho_0^{-(\alpha-1)}}{\Gamma(\alpha)(1-\rho_0^2)}\exp\left\{-\dfrac{u+v}{1-\rho_0^2}\right\}\left(u v\right)^{(\alpha-1)/2} I_{\alpha-1}\left(\dfrac{2\rho_0 \sqrt{u v}}{1-\rho_0^2}\right) \label{eqn0106a} \\ &= f_{\alpha}(u)f_{\alpha}(v)\dfrac{\Gamma(\alpha)}{1-\rho}\exp\left\{-\dfrac{\rho(u+v)}{1-\rho}\right\}\left(\rho u v\right)^{-(\alpha-1)/2} I_{\alpha-1}\left(\dfrac{2\sqrt{u v \rho}}{1-\rho}\right), u,v, \ge 0, \label{eqn0106aa} \end{align}

where ${\rho=\rho_0^2}$, ${0 \le \rho < 1}$, ${\alpha=k/2}$, ${f_{\alpha}(t)=\dfrac{1}{\Gamma(\alpha)}t^{\alpha-1}e^{-t}}$ is the marginal probability density function of $U$ and $V$, and $I_{\nu}(\cdot)$ is the modified Bessel function of the first kind and order $\nu$,. The moment-generating function of $U$ and $V$ is given by

\begin{equation} M(s,t)=\left[(1-s)(1-t)-\rho s t\right]^{-\alpha}, 0\le\rho< 1, s,t \ge 0 \label{eqn0106b}. \end{equation}

This is known as the Kibble Bivariate Gamma Distribution, or sometimes as the Kibble-Wicksell Bivariate Gamma Distribution.

Now, given the distribution of the OLS residuals as stated in the question, it is clear that for any pair of OLS residuals $(e_i,e_j)'$, ${i,j\in\left\{1,2,\ldots,n\right\},i\ne j}$, ${(e_i, e_j)'\sim N(\boldsymbol{0},\boldsymbol{\Sigma})}$, with ${\boldsymbol{\Sigma}=\sigma^2\begin{bmatrix} m_{ii} & m_{ij} \\ m_{ji} & m_{jj} \end{bmatrix}}$. It follows that $\left(\dfrac{e_i^2}{2\sigma^2 m_{ii}},\dfrac{e_j^2}{2\sigma^2 m_{jj}}\right)'$ has the Kibble bivariate Gamma distribution with ${\alpha=1/2}$ (i.e., $k=1$) and ${\rho=\rho_0^2=m_{ij}^2/\left(m_{ii}m_{jj}\right)}$.

The joint PDF of $(e_i^2,e_j^2)'$ differs from that of the Kibble bivariate Gamma distribution only by a multiplicative constant. Using transformation technique, we can write the joint PDF of ${(e_i^2,e_j^2)'=(2\sigma^2 m_{ii}u,2\sigma^2 m_{jj} v)'}$ as

\begin{equation} q(e_i^2,e_j^2;\sigma^2) = p\left(\dfrac{e_i^2}{2\sigma^2 m_{ii}},\dfrac{e_j^2}{2\sigma^2 m_{jj}};\dfrac{1}{2},\dfrac{m_{ij}^2}{m_{ii}m_{jj}}\right)\begin{vmatrix}\dfrac{\partial u}{\partial e_i^2} & \dfrac{\partial u}{\partial e_j^2} \\ \dfrac{\partial v}{\partial e_i^2} & \dfrac{\partial v}{\partial e_j^2} \end{vmatrix}=\left(4\sigma^4 m_{ii} m_{jj}\right)^{-1}p\left(\dfrac{e_i^2}{2\sigma^2 m_{ii}},\dfrac{e_j^2}{2\sigma^2 m_{jj}};\dfrac{1}{2},\dfrac{m_{ij}^2}{m_{ii}m_{jj}}\right) \label{eqn0106c}. \end{equation}

Notice that the distribution's only unknown parameter is $\sigma^2$ (the $m_{ij}$ are known, given the data).

Multivariate Case

Krishnamoorthy and Parthasarathy (1951) extend the Kibble distribution to the multivariate case. They give an expression for the moment-generating function but not the joint probability density function. Expressions in matrix notation for the moment-generating function and characteristic function are found in Kotz et al. (2000) and in Royen (2007). Royen (2007) proposes integral representations that allow numerical evaluation of the PDF, noting however that computation is difficult for ${d \ge 4}$.

Let ${\left\{\boldsymbol{U}_1,\ldots,\boldsymbol{U}_k\right\}}$ be a random sample of size $k$ from a $d$-variate ${N(\boldsymbol{0},\boldsymbol{\Sigma})}$ distribution, where $\boldsymbol{\Sigma}$ is non-singular, and define the correlation matrix as ${\boldsymbol{R}=\sqrt{\mathrm{diag}(\boldsymbol{\Sigma})^{-1}}\boldsymbol{\Sigma}\sqrt{\mathrm{diag}(\boldsymbol{\Sigma})^{-1}}}$, where the square root is applied elementwise. Then, ${\boldsymbol{U}=\left(\dfrac{1}{2\sigma_1^2}\displaystyle\sum_{i=1}^{k}U_{1i}^2,\ldots,\dfrac{1}{2\sigma_d^2}\displaystyle\sum_{i=1}^{k}U_{di}^2\right)'}$, where ${\mathrm{Var}(U_{ji})=\sigma_j^2}$, has the $d$-variate Kibble Gamma distribution with parameters ${\alpha=k/2}$ and $\boldsymbol{R}$.

The $n$-variate Kibble distribution with ${\alpha=1/2}$ and ${\boldsymbol{R}=\mathrm{Corr}(\boldsymbol{e})}$ is, in principle, the distribution of the OLS squared residuals, up to an easily constructed scale factor (the determinant of a diagonal Jacobian). However, as noted in the question, the covariance matrix is singular. Thus the joint distribution of $\boldsymbol{e}\circ\boldsymbol{e}$ is, like that of $\boldsymbol{e}$, degenerate. The most that can be done is to work with the joint distribution of any subset of ${n-p}$ squared residuals, which (again, up to a scaling factor) has a non-degenerate Kibble multivariate Gamma distribution.

Hi @Thomas Farrar, I'm trying to find a copy of the Kibble's 1941 paper, do you happen to have one? Could you double check that the Kibble 1941 bivariate Gamma equation is correct? — Jason, Sep 13 '22 at 06:59
I had a typo inside the $\exp$, it should have been $-(u+v)$, not $-(u_v)$. I took the expression from Balakrishnan & Lai's 2009 book, Continuous Bivariate Distributions (2nd ed.), p. 306. The exact form of the expression in Kibble (1941) is: $\dfrac{\rho^{-(p-1)}}{\Gamma(p)(1-\rho^2)}\left(x y\right)^{\frac{1}{2}(p-1)}\exp\left[-\dfrac{x+y}{1-\rho^2}\right] I_{p-1}\left(\dfrac{2\rho\sqrt{x y}}{1-\rho^2}\right)$, for $I_{p-1}(z)=e^{-\frac{1}{2}\pi(p-1) i} J_{p-1}\left(z e^{\frac{1}{2}\pi i}\right)$. — Thomas Farrar, Sep 13 '22 at 11:45
Thanks for the update. But I guess there is still a minus sign missing in $(u v \ \rho)^{(\alpha-1)/2}$. I had another question regarding Kibble's bivariate Gamma, I wonder whether you could help have a look? — Jason, Sep 14 '22 at 02:13
Well spotted. This was actually an error in Balakrishnan and Lai (2009) which I propagated. I've corrected it above and also emailed the authors to alert them. — Thomas Farrar, Sep 14 '22 at 07:57

Is it possible to derive the joint probability distribution of squared OLS residuals under the classical linear regression assumptions?

1 Answers1