1

Everyone! I'm reading the paper On test marginal versus conditional and I'm a little confused one place in page 9, when the author use "maximum likelihood projection" to get the covariance matrix $\Sigma_1$ such that minimize the KL-distance between two distribution $\mathcal N(0, \Sigma_0)$ and $\mathcal N(0, \Sigma_1)$, where $\Sigma_0$ is a known $3$ by $3$ matrix.

The only constraint on $\Sigma_1$ is that $\mathcal M_1: \rho_{12} = \rho_{13}\rho_{23}$ for $\Sigma_1$. I feel confused on how to apply "maximum likelihood projection" to jump to the form of $\Sigma_1$. Here are the two matrix for $\Sigma_0$ and $\Sigma_1$: \begin{align*} \Sigma_0 = \begin{bmatrix} \sigma_{11} & 0 & \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} \\ 0 & \sigma_{22} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} \\ \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} & \sigma_{33} \end{bmatrix} \\ \Sigma_1 = \begin{bmatrix} \sigma_{11} & \rho_{13}\rho_{23}\sqrt{\sigma_{11}\sigma_{22}} & \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} \\ \rho_{13}\rho_{23}\sqrt{\sigma_{11}\sigma_{22}} & \sigma_{22} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} \\ \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} & \sigma_{33} \end{bmatrix} \end{align*} It seems make sense and two matrix are such similar, but does anyone know how to show that $\Sigma_1$ is indeed the closest matrix to $\Sigma_0$ in the measure of KL-divergence and under the constraint $\mathcal M_1$?

Edit 1: I tried to simplify the problem, it then sound like that: \begin{align*} \Sigma_1^* = \arg \min_{\Sigma_1 \in \mathcal M_1} \log det|\Sigma_1| + tr(\Sigma_1^{-1} \Sigma_0) \end{align*} But the answer is still elusive from the above form, even Lagrangian multiplier does not offer too much help.

0o0o0o0
  • 21
  • Start with writing down an algebraic expression for the KL divergence. You can simplify it substantially using the multivariate version of the approach I describe at https://stats.stackexchange.com/questions/415435: that is, work out the effects of rescaling the variables. The $\sigma_{ii}$ ought to disappear from the problem as a result. After applying the constraint, that will leave you with a function of $(\rho_{13},\rho_{23})$ to minimize. – whuber May 16 '22 at 13:43
  • Finally, I found a useful page that can solve this: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture16.pdf – 0o0o0o0 May 25 '22 at 18:47

0 Answers0