Everyone! I'm reading the paper On test marginal versus conditional and I'm a little confused one place in page 9, when the author use "maximum likelihood projection" to get the covariance matrix $\Sigma_1$ such that minimize the KL-distance between two distribution $\mathcal N(0, \Sigma_0)$ and $\mathcal N(0, \Sigma_1)$, where $\Sigma_0$ is a known $3$ by $3$ matrix.
The only constraint on $\Sigma_1$ is that $\mathcal M_1: \rho_{12} = \rho_{13}\rho_{23}$ for $\Sigma_1$. I feel confused on how to apply "maximum likelihood projection" to jump to the form of $\Sigma_1$. Here are the two matrix for $\Sigma_0$ and $\Sigma_1$: \begin{align*} \Sigma_0 = \begin{bmatrix} \sigma_{11} & 0 & \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} \\ 0 & \sigma_{22} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} \\ \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} & \sigma_{33} \end{bmatrix} \\ \Sigma_1 = \begin{bmatrix} \sigma_{11} & \rho_{13}\rho_{23}\sqrt{\sigma_{11}\sigma_{22}} & \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} \\ \rho_{13}\rho_{23}\sqrt{\sigma_{11}\sigma_{22}} & \sigma_{22} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} \\ \rho_{13} \sqrt{\sigma_{11} \sigma_{33}} & \rho_{23} \sqrt{\sigma_{22} \sigma_{33}} & \sigma_{33} \end{bmatrix} \end{align*} It seems make sense and two matrix are such similar, but does anyone know how to show that $\Sigma_1$ is indeed the closest matrix to $\Sigma_0$ in the measure of KL-divergence and under the constraint $\mathcal M_1$?
Edit 1: I tried to simplify the problem, it then sound like that: \begin{align*} \Sigma_1^* = \arg \min_{\Sigma_1 \in \mathcal M_1} \log det|\Sigma_1| + tr(\Sigma_1^{-1} \Sigma_0) \end{align*} But the answer is still elusive from the above form, even Lagrangian multiplier does not offer too much help.