Calculating weighted covariance matrix of a weighted finite mixture of multivariate normal distributions

Question

I am trying to calculate the weighted covariance matrix for a finite mixture of multivariate normal distributions. I read this post here and this one here, but the first post is focused on uniformly weighted mixtures of gaussians while the second is looking only at the variances of a weighted combination of 2 gaussians.

Assume a finite $k$ number of weights $${}_{k}p \in (0, 1)$$ with $$\sum_{k = 1}^n {}_{k}p = 1$$ and a corresponding $k$ finite number of multivariate normal distributions $$_k\textbf{X} \sim \mathcal{N}(_k\vec{\mu}, {}_{k}\Sigma)$$ where the weight of the $k$th distribution is given by ${}_{k}p$.

The total $T$ weighted mean ${}_{T}\vec{\mu}$ is given by $${}_{T}\vec{\mu} = \sum_{k = 1}^n {}_{k}p *({}_{k}\vec{\mu})$$

I can calculate the total weighted variance ${}_{T}\sigma_i^2$ of each variable ${}_{T}\mu_i$ where ${}_{T}\vec{\mu} = [{}_{T}\mu_1, {}_{T}\mu_2, ..., {}_{T}\mu_j]$ and $i \in \{1,2,...j\}$ by $${}_{T}\sigma_i^2 = \sum_{k = 1}^n {}_{k}p * ({}_{k}\sigma_i^2) + \sum_{k=1}^n {}_{k}p * ({}_{k}\mu_i^2) - \big[\sum_{k=1}^n {}_{k}p * ({}_{k}\mu_i)\big]^2$$

But this only gives me the entries on the diagonal of the total weighted covariance matrix.

Assuming I have access to the ${}_{k}\vec{\mu}$ for each of the $k$ component multivariate normals as well as the full covariance matrix ${}_{k}\Sigma$ for each of the $k$ component multivariate normals, is there an efficient way to compute the total weighted covariance matrix ${}_{T}\Sigma$ using the weight of each component mixture ${}_{k}p$?

Thanks for your thoughts.

If $Z$ is a discrete random variable taking on values $z_1, z_2, \cdots, z_n$ with probabilities $p_1, p_2, \cdots, p_n$, then $f_{\mathbf{X} \mid Z = z_k}(\mathbf x \mid Z = z_k) \sim N(\mathbf{\mu}k, \Sigma_k)$ and $$f{\mathbf{X}}(\mathbf x) = \sum_{k=1}^n p_k f_{\mathbf{X} \mid Z = z_k}(\mathbf x \mid Z = z_k) \sim \sum_{k=1}^n p_k N(\mathbf{\mu}k, \Sigma_k).$$ The covariance matrix for $\mathbf X$ should work out to be the weighted sum $\sum{k=1}^n p_k \Sigma_k$ of covariance matrices plus the covariance matrix of the weighted mean vector $\sum_{k=1}^n p_k \mathbf{\mu}_k$ — Dilip Sarwate, Jul 17 '22 at 20:56
Thanks @Dilip, that empirically seems to work (based on some calculations), but if you could post an answer as to how you came up with that result (preferably some sort of derivation using matrix notation) would be happy to accept that as an answer. — BeginnersMindTruly, Jul 18 '22 at 20:49
Isn't this fully answered at https://stats.stackexchange.com/questions/51622/combining-two-covariance-matrices/51927#51927? — whuber, Jul 19 '22 at 20:31
@whuber, potentially, but if so, not in a way that I could understand off hand. Dilip's answer in the comments covered the practicalities, and his work on the answer below (work in progress) is looking super helpful re: matrix notation for helpful computation. But thanks for that post, I'll spend more time looking at it, so I can try to translate it into my problem spec. — BeginnersMindTruly, Jul 19 '22 at 20:38

Dilip Sarwate · Accepted Answer · 2022-07-23T02:15:23.923

Let $\mathbf X = [X_1, X_2, \cdots, X_n]$ denote a random vector (or vector random variable) where the $X_i$ are scalar random variables with finite means $m_i$ and finite variances $\sigma_i^2$. The mean$E[\mathbf X]$ of the vector random variable is just $[m_1, m_2, \cdots, m_n]$, the vector of means. That is, \begin{align} E[\mathbf X] &= E[X_1, X_2, \cdots, X_n]\\ &= \left[ E[X_1], E[X_2], \cdots, E[X_n]\right]\\ &= [m_1, m_2, \cdots, m_n]\\ &= \mathbf m, \end{align} the expectation of a random vector is the vector of expectations. On the other hand, the variance of a random vector is not the vector of variances; it is the $n\times n$ covariance matrix of the $n$ random variables constituting the random vector. The reason for this is as follows. The variance of a scalar random variable $X$ is defined as $$\operatorname{Var}(X) = E[(X-m)^2] = E[(X-m)(X-m)]$$ but its generalization to $$\operatorname{Var}(\mathbf X) = E[(\mathbf X-\mathbf m)^2] = E[(\mathbf X-\mathbf m)(\mathbf X-\mathbf m)]$$ makes no sense; the product of two row vectors is undefined! Changing the definition to $E[(\mathbf X-\mathbf m)(\mathbf X-\mathbf m)^T]$ (which is valid per matrix algebra rules) gives $\operatorname{Var}(\mathbf X) =\sum_{i=1}^n \sigma_i^2$, which is nice but not very informative except the special case when the $X_i$ are uncorrelated (or independent) random variables. On the other hand, defining the variance of the random vector $\mathbf X$ as $$\operatorname{Var}(\mathbf X) = E[(\mathbf X-\mathbf m)^T(\mathbf X-\mathbf m)]$$ makes $\operatorname{Var}(\mathbf X)$ the $n\times n$ covariance matrix $\mathbf C$ of the $n$ random variables. The $(s,t)$-th element of $\mathbf C$ is $$\mathbf C_{s,t} = E[(X_s-m_s)(X_t-m_t)] = \operatorname{Cov}(X_s, X_t).$$ Note that the $(t,t)$-th entry is $\mathbf C_{t,t} = \operatorname{Cov}(X_t, X_t) = \operatorname{Var}(X_t) = \sigma_t^2$ and thus must be nonnegative.

Now, the total variance formula tells us that $$\operatorname{Var}(\mathbf X) = E\big[\operatorname{Var}(\mathbf X\mid Y)\big] + \operatorname{Var}\big(E[\mathbf X\mid Y]\big).$$ Suppose that $Y$ is a discrete random variable taking on $k$ distinct values $y_1, y_1, \cdots, y_k$ with probabilities $p_1, p_2, \cdots, p_k$, and suppose that conditioned on $Y = y_i$, $\mathbf X$ has mean vector $\mathbf m^{(i)}$ and covariance matrix $\mathbf C^{(i)}$. That is, $\operatorname{Var}(\mathbf X\mid Y = y_i) = \mathbf C^{(i)}$, or equivalently, the matrix random variable $\operatorname{Var}(\mathbf X\mid Y)$ takes on value $\mathbf C^{(i)}$ with probability $p_i$, and so $$E\big[\operatorname{Var}(\mathbf X\mid Y)\big] = \sum_{i=1}^k p_i\mathbf C^{(i)}.$$ Similarly, conditioned on $Y = y_i$, $E[\mathbf X\mid Y = y_i] = \mathbf m^{(i)}$, that is, the vector random variable $E[\mathbf X\mid Y]$ takes on value $\mathbf m^{(i)}$ with probability $p_i$. The mean of this random variable is $$E\big[E[\mathbf X\mid Y]\big] = \sum_{i=1}^k p_iE[\mathbf X\mid Y = y_i] = \sum_{i=1}^k p_i \mathbf m^{(i)}$$ What is the variance of this vector random variable? Well, as discussed above, the variance of a vector random variable is the covariance matrix of the $n$ individual random variables. The $s$-th random variable and the $t$-th random variable in $E[\mathbf X\mid Y]$ take on values $$(\mathbf m_s^{(1)},\mathbf m_t^{(1)}), (\mathbf m_s^{(2)},\mathbf m_t^{(2)}), \cdots, (\mathbf m_s^{(k)},\mathbf m_t^{(k)})$$ with probabilities $p_1, p_2, \cdots, p_k$ respectively. Hence, their covariance is $$\operatorname{Var}\big(E[\mathbf X\mid Y]\big)_{s,t} = \sum_{i=1}^k p_i\mathbf m_s^{(i)}\mathbf m_t^{(i)} - \left(\sum_{i=1}^k p_i\mathbf m_s^{(i)}\right)\left(\sum_{j=1}^k p_j\mathbf m_t^{(j)}\right).$$ Now with $\mathbf M$ denoting the $k\times n$ matrix with rows $\mathbf m^{(1)}, \mathbf m^{(2)}, \cdots, \mathbf m^{(k)}$ and $\mathbf P$ the $k\times k$ diagonal matrix $\operatorname{diag}[p_1, p_2, \cdots, p_k]$, we recognize the first sum in the expression for $\operatorname{Var}\big(E[\mathbf X\mid Y]\big)$ as the $n\times n$ matrix $\mathbf M^T\mathbf{PM}$ and the second term as $(\mathbf{PM})^T\mathbf{PM}$

Putting it all together, we have \begin{align} \operatorname{Var}(\mathbf X) &= E\big[\operatorname{Var}(\mathbf X\mid Y)\big] + \operatorname{Var}\big(E[\mathbf X\mid Y]\big)\\ &= \sum_{i=1}^k p_i\mathbf C^{(i)} + \mathbf M^T\mathbf{PM} - (\mathbf{PM})^T\mathbf{PM}. \end{align}

Calculating weighted covariance matrix of a weighted finite mixture of multivariate normal distributions

1 Answers1