Convexity of the variance of a mixture distribution

Question

$X$ is a random variable that is sampled from the mixture of uniform distributions. In other words: $$X \sim \sum_{i=1}^N w_i \cdot \mathbb{U}(x_i, x_{i+1}),$$ where $\mathbb{U}(x_i, x_{i+1})$ denotes a random variable that follows a uniform distribution in $[x_i, x_{i+1}]$. For feasibility, we need $w \geq 0, \ \sum_{i=1}^N w_i = 1$.

In an optimization problem my variables are $w_i$ for $i=1,\ldots,N$, and I would like to upper bound the variance of $X$. According to Wikipedia, the variance of $X$ is: $$\mathrm{Var}(X) = \sum_{i=1}^N w_i(\sigma_i^2+ \mu_i^2 - \mu^2) $$ where $\sigma_i^2$ and $\mu_i$ are the variance and mean of $\mathbb{U}(x_i, x_{i+1})$, respectively (which are parameters), and $\mu$ is the mean of the mixture, which is $$\mu = \sum_{i=1}^N w_i \frac{x_i + x_{i+1}}{2}.$$

Thus, if my derivation is not wrong: $$ \mathrm{Var}(X) = \sum_{i=1}^N w_i\left(\sigma_i^2+ \mu_i^2 - \left(\sum_{j=1}^N w_j \frac{x_j + x_{j+1}}{2}\right)^2\right) $$ which is very ugly and appears to be non-convex to upper bound this function (edit: I want to constrain $\mathrm{Var}(X) \leq \mathrm{constant}$).

My question is, is there any trick, or any other convex approximation of such a variance, such that I can include an upper bound on the variance constraint?

Just for completion: in terms of only $w_i$ and $x_i$ the variance is given by $$\frac13\sum_{i=1}^Nw_i(x_i^2+x_ix_{i+1}+x_{i+1}^2)-\frac14\left(\sum_{i=1}^Nw_i(x_i+x_{i+1})\right)^2.$$ Unfortunately using Cauchy-Schwarz on the last term yields a lower bound. — ə̷̶̸͇̘̜́̍͗̂̄︣͟, Jul 30 '20 at 06:39
You can remove the $w_i^3$ terms by taking $\mu$ out of the summation and using that $\sum_i w_i = 1$. The Wikipedia link you provided makes the same step. — Kevin Dalmeijer, Jul 30 '20 at 19:10
The Wikipedia formula for the variance looks wrong to me. Assuming the component random variables are independent, shouldn't the variance of $X$ be $\sum_i w_i^2 \sigma_i^2$? — prubin, Jul 30 '20 at 21:58
@prubin can you please elaborate on your argument ? I also feel their derivation is wrong. — independentvariable, Jul 30 '20 at 23:35
@prubin Your formula is incorrect even if $n = 2, w_1 = w_2 = 1/2, \mu_1 = \mu_2 = 0, \sigma_1^2 = 1, \sigma_2^2 = 1$. In that case, variance = 1, but your formula gives variance = 1/2. — Mark L. Stone, Jul 31 '20 at 11:33
@independentvariable Are you looking to find an upper bound on Var($X$), or are you trying to include the constraint Var($X$) $\le d$ into an optimization problem? The first one is a convex problem, as shown in Mark's answer. The second one is probably more difficult. — Kevin Dalmeijer, Jul 31 '20 at 14:48
@independentvariable Sorry, I was misled by your first formula. What I wrote was the variance of the weighted sum of the uniform variables. A mixture distribution is not a weighted sum of independent variables, though. Rather, you pick one of the variables randomly based on the weights and the get an observation of that variable. — prubin, Jul 31 '20 at 15:51
@KevinDalmeijer second one! I want to constrain the variance... — independentvariable, Jul 31 '20 at 16:39
@Kevin Dalmeijer It turns out to be easy to incorporate that non-convex constraint via a trivial post-processing step to the convex QP solution, See the Edit to my answer. — Mark L. Stone, Jul 31 '20 at 18:06

Mark L. Stone · Accepted Answer · 2020-07-31T17:08:14.120

5

In order to find the best upper bound for variance, for given input values of $u_i$ and $\sigma_i^2$, you should globally maximize variance with respect to the $w_i$, subject to the constraints $w_i \ge 0, \Sigma w_i = 1$.

This can be formulated as a convex QP (Quadratic Programming problem), i.e., maximizing a concave quadratic subject to linear constraints. Hence it is easy to solve, unless $n$ is gigantic, which hardly seems likely for any reasonable mixture distribution. I leave to the OP as an exercise, whether the KKT conditions can yield a closed form solution.

The convex QP takes the form:

maximize $(\Sigma_{i=1}^n w_i (\sigma_i^2 +\mu_i^2)) - \mu^2$ with respect to $\mu, w_i$

subject to $\Sigma_{i=1}^n w_i \mu_i = \mu, w_i \ge 0 \forall i, \Sigma_{i=1}^n w_i = 1$.

If all $u_i$ are equal to each other, this would be a Linear Programming problem with compact constraints. Therefore the optimum would be at a vertex of the constraints, and in this case, that vertex would be $w_i = 1$ for the $i$ corresponding to the largest $\sigma_i^2$, and all other $w_i = 0$.

Edit: In response to edit to question: "I want to constrain Var(X) $\le$ constant)"

If the naive approach of adding the constraint Var(X) $\le $ constant to my above convex QP formulation were performed, that would add a non-convex quadratic constraint, making the problem a non-convex Quadratically-Constrained Quadratic Program (QCQP), which requires a global optimizer, such as Gurobi 9.x or BARON to solve to global optimality.

However, there is an easier, faster method: Solve the (pre-Edit) convex QP formulation. Then maximum variance, accounting for the constraint Var(X) $\le$ constant), equals

min(optimal objective value of convex QP formulation,constant).

edited Jul 31 '20 at 17:08

answered Jul 31 '20 at 05:30

Mark L. Stone

13,392
1
31
67

Many thanks for your reply! Before going to the direction of an upper bound, I want to make sure whether the Wikipedia formula is correct. I think user prubin's answer is also given here without any common mean assumption: http://theanalysisofdata.com/probability/3_13.html#:~:text=The%20moments%20of%20a%20mixture,(X(i)). – independentvariable Jul 31 '20 at 09:52
2

@prubin is usually correct, but not on this. Your link, http://theanalysisofdata.com is just flat out wrong., Don't believe me? Try a simulation. Handy formula: Variance = mean of conditional variance + variance of conditional mean. The 2nd of these terms is affected by how common or not the $\mu_i$ are. The linked (and prubin's) formula is incorrect even if $n = 2, w_1 = w_2 = 1/2, \mu_1 = \mu_2 = 0, \sigma_1^2 = 1, \sigma_2^2 = 1$. In that case, variance = 1, but linked and prubin's formula gives variance = 1/2. – Mark L. Stone Jul 31 '20 at 11:37
2

Yes, Wikipedia's formulas are correct. You can verify by simulation if you don't trust your analytical skills. I made use of Wikipedia's formulas in my QP formulation, but I cleverly didn't substitute everything into the objective rather, added linear constraint for $\mu$. – Mark L. Stone Jul 31 '20 at 11:45
I don't completely understand the easier, faster method. Also, I'm a bit sceptical because the Var($X$) $\le$ constant has a non-convex feasible region in $w$, which seems difficult to avoid. – Kevin Dalmeijer Jul 31 '20 at 18:28
If the optimal objective value of the convex QP is > "constant", the max variance constraint is non-binding. If "constant" $\le$ optimal objective value of convex QP, then by continuity of the objective function with respect to $w_i$, the optimal objective value (i.e., max variance) with the max variance constraint is "constant". QED. Continuity is the key to sneaking around the non-convexity. – Mark L. Stone Jul 31 '20 at 20:31

Convexity of the variance of a mixture distribution

1 Answers1