Let $\mathbf X = [X_1, X_2, \cdots, X_n]$ denote a random vector (or vector random variable) where the $X_i$ are scalar random variables with finite means $m_i$ and finite variances $\sigma_i^2$. The mean$E[\mathbf X]$ of the vector random variable is just $[m_1, m_2, \cdots, m_n]$, the vector of means. That is,
\begin{align}
E[\mathbf X] &= E[X_1, X_2, \cdots, X_n]\\
&= \left[ E[X_1], E[X_2], \cdots, E[X_n]\right]\\
&= [m_1, m_2, \cdots, m_n]\\
&= \mathbf m,
\end{align}
the expectation of a random vector is the vector of expectations.
On the other hand, the variance of a random vector is not the vector of variances; it is the $n\times n$ covariance matrix of the $n$ random variables constituting the random vector. The reason for this is as follows. The variance of a scalar random variable $X$ is defined as $$\operatorname{Var}(X) = E[(X-m)^2] = E[(X-m)(X-m)]$$ but its generalization to $$\operatorname{Var}(\mathbf X) = E[(\mathbf X-\mathbf m)^2] = E[(\mathbf X-\mathbf m)(\mathbf X-\mathbf m)]$$ makes no sense; the product of two row vectors is undefined! Changing the definition to $E[(\mathbf X-\mathbf m)(\mathbf X-\mathbf m)^T]$ (which is valid per matrix algebra rules) gives $\operatorname{Var}(\mathbf X) =\sum_{i=1}^n \sigma_i^2$, which is nice but not very informative except the special case when the $X_i$ are uncorrelated (or independent) random variables. On the other hand, defining the variance of the random vector $\mathbf X$ as
$$\operatorname{Var}(\mathbf X) = E[(\mathbf X-\mathbf m)^T(\mathbf X-\mathbf m)]$$
makes $\operatorname{Var}(\mathbf X)$ the $n\times n$ covariance matrix $\mathbf C$ of the $n$ random variables. The $(s,t)$-th element of $\mathbf C$ is $$\mathbf C_{s,t} = E[(X_s-m_s)(X_t-m_t)] = \operatorname{Cov}(X_s, X_t).$$ Note that the $(t,t)$-th entry is $\mathbf C_{t,t} = \operatorname{Cov}(X_t, X_t) = \operatorname{Var}(X_t) = \sigma_t^2$ and thus must be nonnegative.
Now, the total variance formula tells us that
$$\operatorname{Var}(\mathbf X) = E\big[\operatorname{Var}(\mathbf X\mid Y)\big] + \operatorname{Var}\big(E[\mathbf X\mid Y]\big).$$
Suppose that $Y$ is a discrete random variable taking on $k$ distinct values $y_1, y_1, \cdots, y_k$ with probabilities $p_1, p_2, \cdots, p_k$, and suppose that conditioned on $Y = y_i$, $\mathbf X$ has mean vector $\mathbf m^{(i)}$ and covariance matrix $\mathbf C^{(i)}$. That is, $\operatorname{Var}(\mathbf X\mid Y = y_i) = \mathbf C^{(i)}$, or equivalently, the matrix random variable $\operatorname{Var}(\mathbf X\mid Y)$ takes on value $\mathbf C^{(i)}$ with probability $p_i$, and so
$$E\big[\operatorname{Var}(\mathbf X\mid Y)\big] = \sum_{i=1}^k p_i\mathbf C^{(i)}.$$
Similarly, conditioned on $Y = y_i$, $E[\mathbf X\mid Y = y_i] = \mathbf m^{(i)}$, that is, the vector random variable $E[\mathbf X\mid Y]$ takes on value $\mathbf m^{(i)}$ with probability $p_i$.
The mean of this random variable is
$$E\big[E[\mathbf X\mid Y]\big] = \sum_{i=1}^k p_iE[\mathbf X\mid Y = y_i] = \sum_{i=1}^k p_i \mathbf m^{(i)}$$
What is the variance of this vector random variable? Well, as discussed above, the variance of a vector random variable is the covariance matrix of the $n$ individual random variables. The $s$-th random variable and the $t$-th random variable in $E[\mathbf X\mid Y]$ take on values $$(\mathbf m_s^{(1)},\mathbf m_t^{(1)}), (\mathbf m_s^{(2)},\mathbf m_t^{(2)}), \cdots, (\mathbf m_s^{(k)},\mathbf m_t^{(k)})$$ with probabilities $p_1, p_2, \cdots, p_k$ respectively. Hence, their covariance is
$$\operatorname{Var}\big(E[\mathbf X\mid Y]\big)_{s,t} = \sum_{i=1}^k p_i\mathbf m_s^{(i)}\mathbf m_t^{(i)} - \left(\sum_{i=1}^k p_i\mathbf m_s^{(i)}\right)\left(\sum_{j=1}^k p_j\mathbf m_t^{(j)}\right).$$
Now with $\mathbf M$ denoting the $k\times n$ matrix with rows $\mathbf m^{(1)}, \mathbf m^{(2)}, \cdots, \mathbf m^{(k)}$ and $\mathbf P$ the $k\times k$ diagonal matrix $\operatorname{diag}[p_1, p_2, \cdots, p_k]$, we recognize the first sum in the expression for $\operatorname{Var}\big(E[\mathbf X\mid Y]\big)$ as the $n\times n$ matrix $\mathbf M^T\mathbf{PM}$ and the second term as $(\mathbf{PM})^T\mathbf{PM}$
Putting it all together, we have
\begin{align}
\operatorname{Var}(\mathbf X) &= E\big[\operatorname{Var}(\mathbf X\mid Y)\big] + \operatorname{Var}\big(E[\mathbf X\mid Y]\big)\\
&= \sum_{i=1}^k p_i\mathbf C^{(i)} + \mathbf M^T\mathbf{PM} - (\mathbf{PM})^T\mathbf{PM}.
\end{align}