1

Suppose you have three variables $y\in\{0,1\}$ and $x_1\in\mathbb{R}$ and $x_2\in\mathbb{R}$. I want to produce data with the following generative process which corresponds to a Mixture of Gaussians (MoG):

$$ \begin{align} y\sim & Ber(p)\\ \mathbf{x}\sim & \mathcal{N}(\mu_y, \Sigma_y), \end{align} $$

where $\Sigma_y$ is diagonal. That is, $\mathbf{x}$ is conditionally independent given $y$. Furthermore, I want to ensure that marginally $\mathbf{x}$ has a covariance matrix given by $\Sigma_\mathbf{x}$, with a predetermined covariance between $x_1$ and $x_2$.

So to ensure conditional independence, we need to sample from diagonal covariance matrices on the second step, but ...

How should I systematically choose $\mu_y$ and $\Sigma_y$ so that the marginal dependence is exactly the one I set? Moreover, I want to standardise the generated data. Does this change anything?

Sergio
  • 336
  • 3
    As I understand the two vectors $\boldsymbol{\mu}_y$ and the two covariances $\boldsymbol{\Sigma}_y$ for $y=0$, $1$ are unknown and you want a prescribed covariance (and maybe also a prescribed expectation). The law of total expectation should give the constraints. – Yves Sep 22 '23 at 14:17
  • 1
    https://stats.stackexchange.com/a/362683/7224 – Xi'an Sep 23 '23 at 09:14

1 Answers1

2

The law of total (co)variance writes as $$ \text{Cov}(\mathbf{x}) = \mathbb{E}[\text{Cov}(\mathbf{x} \, \vert \, y)] + \text{Cov}[\mathbb{E}(\mathbf{x} \,\vert\, y)]. $$ With $q := 1 -p$, the first part (a.k.a. within group) is $$ \mathbb{E}[\text{Cov}(\mathbf{x} \, \vert \, y)] = q \boldsymbol{\Sigma}_0 + p \boldsymbol{\Sigma}_1. $$ Using $\boldsymbol{\mu} := q \boldsymbol{\mu}_0 + p\boldsymbol{\mu}_1$, the second part (a.k.a. between groups) is $$ \text{Cov}[\mathbb{E}(\mathbf{x} \,\vert\, y)] = q \left[\boldsymbol{\mu}_0 - \boldsymbol{\mu} \right] \left[\boldsymbol{\mu}_0 - \boldsymbol{\mu} \right]^\top + p \left[\boldsymbol{\mu}_1 - \boldsymbol{\mu} \right] \left[\boldsymbol{\mu}_1 - \boldsymbol{\mu} \right]^\top = qp \left[\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0 \right] \left[\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0 \right]^\top $$ where the second equality comes by simple algebra. This shows that $\text{Cov}[\mathbb{E}(\mathbf{x} \,\vert\, y)]$ is a rank-one matrix.

Problem Given the diagonal conditional covariances $\boldsymbol{\Sigma}_0$ and $\boldsymbol{\Sigma}_1$, given a covariance matrix $\boldsymbol{\Sigma}$, we want to find $\boldsymbol{\mu}_0$ and $\boldsymbol{\mu}_1$ such that $\text{Cov}(\mathbf{x}) = \boldsymbol{\Sigma}$.

The matrix $\boldsymbol{\Sigma}$ should be such that the difference $\boldsymbol{\Delta} := \boldsymbol{\Sigma} - q \boldsymbol{\Sigma}_0 - p \boldsymbol{\Sigma}_1$ is positive (hence has positive diagonal elements). Moreover, the difference should be a rank-one matrix. If these two conditions are fulfilled, then $\boldsymbol{\Delta} = \boldsymbol{\delta}\boldsymbol{\delta}^\top$ for some vector $\boldsymbol{\delta}$. By choosing any $\boldsymbol{\mu}$ and then $$ \boldsymbol{\mu}_0 := \boldsymbol{\mu} + \alpha_0 \, \boldsymbol{\delta}, \qquad \boldsymbol{\mu}_1 := \boldsymbol{\mu} + \alpha_1 \, \boldsymbol{\delta} $$ one gets the wanted covariance $\boldsymbol{\Sigma}$ for a suitable choice of $\alpha_0$ and $\alpha_1$. We have then $\boldsymbol{\mu}_1 -\boldsymbol{\mu}_0 = [\alpha_1 - \alpha_0]\boldsymbol{\delta}$ so the constraints are $qp [\alpha_1 - \alpha_0]^2 = 1$ and $q\alpha_0 + p \alpha_1 =0$ so we can take $\alpha_0 := \sqrt{p/q}$ and $\alpha_1:= - \sqrt{q/p}$.

The result does not restrict to the bivariate case as in OP. However with a mixture of two $d$-dimensional Gaussian distributions, the rank-one condition becomes more difficult to fulfil. We can then consider a mixture of $m >2$ Gaussian distributions, the constraint then being that the between-groups covariance $\boldsymbol{\Delta}$ has rank $\leqslant m-1$. Note also that taking the covariance matrices of the component as zero, we consider a mixture of Dirac distributions. In order to get an arbitrary mean $\boldsymbol{\mu}$ and an arbitrary covariance $\boldsymbol{\Sigma}$ for a mixture of $d$ dimensional Gaussians with given weights we need to use $d + 1$ distributions.

EDIT For a generalisation, consider the case where $\mathbf{x}$ has length $d$ and a mixture of $d + 1$ Gaussian distributions with covariance matrices $\boldsymbol{\Sigma}_i$ and with a vector $\mathbf{p}$ of $d+1$ weights $p_i >0$ with $\sum_{i=1}^{d+1}p_i = 1$. We claim that provided that the matrix $\boldsymbol{\Delta}:= \boldsymbol{\Sigma} - \sum_{i=1}^{d+1} p_i \boldsymbol{\Sigma}_i$ is positive, we can find $d+1$ mean vectors $\boldsymbol{\mu}_i$ so that the mixture $\sum_i p_i \,\texttt{Norm}(\boldsymbol{\mu}_i,\, \boldsymbol{\Sigma}_i)$ has the given mean $\boldsymbol{\mu}$ and the given covariance $\boldsymbol{\Sigma}$.

Since $\boldsymbol{\Delta}$ is positive we can write it as $\boldsymbol{\Delta} = \mathbf{V}\mathbf{V}^\top$ where $\mathbf{V}$ is a $d \times d$ matrix. For that aim, the eigendecomposition of $\boldsymbol{\Delta}$ can be used or a Cholesky decomposition. Let us temporarily admit that we can find a $d \times (d +1)$ matrix $\mathbf{A}$ such that

$$ \tag{1} \left\{ \begin{array}{c c} \mathbf{A} \text{diag}(\mathbf{p}) \mathbf{A}^\top &= \mathbf{I}_d,\\ \mathbf{A} \mathbf{p} &= \mathbf{0}_d,\rule{0pt}{1.2em} \end{array} \right. $$ where $\mathbf{I}_d$ and $\mathbf{0}_d$ are the identity matrix and the vector of zeros. Then with $\boldsymbol{\alpha}_i$ being the $i$-th column of the $d \times (d + 1)$ matrix $\mathbf{A}$, take $$ \boldsymbol{\mu}_i := \boldsymbol{\mu} + \mathbf{V} \boldsymbol{\alpha}_i \qquad i=1, \, \dots,\, d+1. $$ Let us check that we get: the wanted covariance and the wanted mean. Firstly $$ \sum_{i=1}^{d+1} p_i \left[\boldsymbol{\mu}_i - \boldsymbol{\mu} \right] \left[\boldsymbol{\mu}_i - \boldsymbol{\mu} \right]^\top = \sum_{i=1}^{d+1} p_i \mathbf{V} \boldsymbol{\alpha}_i \boldsymbol{\alpha}_i^\top \mathbf{V}^\top = \mathbf{V} \left\{ \sum_{i=1}^{d+1} p_i \boldsymbol{\alpha}_i \boldsymbol{\alpha}_i^\top \right\} \mathbf{V}^\top = \mathbf{V} \mathbf{V}^\top = \boldsymbol{\Delta}, $$ since the matrix between the curly brackets is $\mathbf{A} \text{diag}(\mathbf{p}) \mathbf{A}^\top = \mathbf{I}_d$ from (1). Secondly $$ \sum_{i=1}^{d+1} p_i \left[\boldsymbol{\mu}_i - \boldsymbol{\mu} \right] = \sum_{i=1}^{d+1} p_i \mathbf{V} \boldsymbol{\alpha}_i = \mathbf{V} \left\{ \sum_{i=1}^{d+1} p_i \boldsymbol{\alpha}_i \right\} = \mathbf{V} \mathbf{A} \mathbf{p} = \mathbf{0} $$ because of the second condition in (1). So $\sum_{i=1}^{d+1} p_i \boldsymbol{\mu}_i = \boldsymbol{\mu}$ as wanted.

Now let us show how the matrix $\mathbf{A}$ in (1) can be found. By separating the first $d$ columns and the last one, let $\mathbf{A} =: [\mathbf{A}_1 \vert \mathbf{A}_2]$. Let $\mathbf{D}_1 := \text{diag}(\mathbf{p}_{1:d})$ and $\mathbf{D}_2 := p_{d+1}$. We want that $$ \left\{ \begin{array}{c c} \mathbf{A}_1 \mathbf{D}_1 \mathbf{A}_1^\top + \mathbf{A}_2 \mathbf{D}_2 \mathbf{A}_2^\top &= \mathbf{I}_d,\\ \mathbf{A}_1 \mathbf{p}_{1:d} + \mathbf{A}_2 p_{d+1} &= \mathbf{0}_d.\rule{0pt}{1.2em} \end{array} \right. $$ The second equation gives $\mathbf{A}_2 = -\mathbf{A}_1 \mathbf{p}_{1:d} /p_{d+1}$, and then the first one $$ \mathbf{A}_1 \left\{ \mathbf{D}_1 + \frac{1}{p_{d+1}^2} \mathbf{p}_{1:d}\mathbf{p}_{1:d}^\top \right\} \mathbf{A}_1^\top = \mathbf{I}_d $$ we can find $\mathbf{A}_1$ from the Cholesky decomposition of the matrix between the curly brackets.

Yves
  • 5,358
  • Thank you for your answer. I have been digesting it for a couple of days and tried to do a simple example by myself and failed. Let me show you. Suppose $\mu=[0,0], \Sigma=\begin{pmatrix}3 & 2\sigma_{0,1}\ 2\sigma_{0,1} & 3\end{pmatrix}$ and $\Sigma_0=\Sigma_1=I$. Then according to your answer, we should be able to derive the mean vectors that produce $\Sigma$. If $\delta = [\delta_0, \delta_1]$, we have $\delta\delta^\top = \begin{pmatrix} \delta_0^2 & \delta_0\delta_1\ \delta_0\delta_1 & \delta_1^2\end{pmatrix}$. – Sergio Sep 27 '23 at 13:24
  • Given your formulas, then we would have that $\delta\delta^\top = \Delta = \begin{pmatrix} 2 & 2\sigma_{0,1}\ 2\sigma_{0,1} & 2\end{pmatrix}$ so that, $\delta_0=\delta_1=\sqrt{2}$ but $\delta_0\delta_1=2\sigma_{0,1}$ which is not true in the general case where $\sigma_{0,1}\neq 1$. Could you please point to my misunderstanding, if there is any? – Sergio Sep 27 '23 at 13:29
  • 1
    For the solution to exist, $\boldsymbol{\Delta}$ must be of rank one which is actually the case only when $\sigma_{0,1} =1$. If you want to escape this rank-one condition you need to use a mixture of $3$ normal distributions. The eigendecomposition of $\boldsymbol{\Delta}$ provides the group means. Written as $\boldsymbol{\Delta} = \mathbf{P}\mathbf{D}\mathbf{P}^\top$ were $\mathbf{P}$ is orthogonal gives $\boldsymbol{\Delta}$ as the sum of the $d_i \mathbf{p}_i\mathbf{p}_i^\top$ – Yves Sep 27 '23 at 13:49
  • Ohh thank you. This is super interesting. Does that imply that, if I am only interested in choosing $\sigma_{0,1}$, then the choice of $\Sigma,\Sigma_0$ and $\Sigma_1$, with $\Sigma_0=\Sigma_1$ that guarantee the rank-one condition are the following: $\Sigma=\begin{pmatrix} 2\sigma_{0,1} & \sigma_{0,1}\ \sigma_{0,1} & 2\sigma_{0,1}\end{pmatrix}$ and $\Sigma_0=\Sigma_1=\begin{pmatrix} \sigma_{0,1} & 0\ 0 & \sigma_{0,1}\end{pmatrix}$? Then we would have $\Delta=\begin{pmatrix}\sigma_{0,1} &\sigma_{0,1}\ \sigma_{0,1}&\sigma_{0,1}\end{pmatrix}$, so that $\delta_0=\delta_1=\sqrt{\sigma_{0,1}}$. – Sergio Sep 27 '23 at 15:19
  • 1
    Yes! You can also start from non diagonal matrices $\boldsymbol{\Sigma}_0$ and $\boldsymbol{\Sigma}_1$ corresponding to a negative correlation and get a marginal $\boldsymbol{\Sigma}$ with positive correlation illustrating Simspon's paradox. – Yves Sep 27 '23 at 15:43