How can I maximize the sum of norm.dist values using solver?

Question

EDIT #2 5/17/2020: I re-phrased my question once more. My original question is still at the end. Thanks for the feedback.

I want to know if it's possible to maximize the sum of cumulative distribution functions for independent normal distributions in Excel using OpenSolver.

$$\text{maximize } \Phi(\sum_i x_ip_i) + \Phi(\sum_i x_iq_i)$$ $$\text{ s.t.} \sum_ix_i=3 $$ $$x_i=1 \text{ or } 0$$

where $\Phi(\cdot)$ is the standard normal cdf (or NORM.DIST in Excel).

ORIGINAL:

Here is a snapshot of the workbook: https://i.stack.imgur.com/QqIKC.jpg

Here are the formulas: https://i.stack.imgur.com/4x6X0.jpg

I'd like to maximize a cell that is the sum of other cells that are not linear. I have Z scores for categories I and II (I hardcoded in a mean of 30, and stdev of 15 for category I and 40 and 20 for category II).

I want to maximize the sum of their normdist values in Row 13 (G13 and H13). Clearly, solver won't find a linear solution since the normdist function isn't linear. I have heard there was a way to get around this by doing piecewise-linear approximation.

Other notes:

Columns B and C contains raw data - no formulas
Column E is made up of 1s and 0s that are the variable cells determined by Solver
G2:H6 will return the raw data found in columns B and C if column E displays a 1
The sum of E2:E6 must equal 3
This is very scaled down from the real problem I'm trying to solve, so no brute force methods will suffice.

Welcome to OR.SE. Is there any mathematical formulation on your problem? — A.Omidi, Apr 16 '20 at 10:01
It would help if you put the images directly into your post, so that we don't have to click on external links to view them. Also, as @A.Omidi said, please provide a mathematical (algebraic) formulation of your problem, using MathJax. Thanks! — LarrySnyder610, Apr 16 '20 at 12:43

score 5 · Answer 1 · answered Apr 16 '20 at 16:13

You can approximate any nonlinear function via a piecewise-linear function using binary variables (or, if your solver supports them, type 2 special ordered sets). I have an old blog post explaining how to do this with SOS2. As to how accurate it is, your mileage will vary.

The version of Solver in Excel is fairly limited. You might want to consider the OpenSolver or SolverStudio add-ins for Excel (both open source), which can handle larger models and tend to be more powerful. At least some of the solvers they support can handle SOS2 constraints, but I don't know if their interfaces have provisions for specifying SOS2. If not, you could definitely fall back to using binary variables.

LarrySnyder610 · Answer 2 · 2020-05-18T01:03:03.463

1

NOTE: This answer was written to respond to an earlier version of the question, which used a different form of the objective function. This answer is incorrect for the new version but I’m leaving it here in case it’s useful to anyone.

I think this problem can be reformulated in a much simpler way. You are trying to

$$\text{maximize} \quad \sum_i \left[ \Phi\left(\frac{x_ip_i-30}{15}\right) + \Phi\left(\frac{x_iq_i-40}{20}\right)\right] $$

where $\Phi(\cdot)$ is the standard normal cdf (or NORM.DIST in Excel).

Why not just calculate these terms for each $i$ and then use the result as the coefficients for $x_i$? What I mean is, let

$$a_i = \Phi\left(\frac{p_i-30}{15}\right)+ \Phi\left(\frac{q_i-40}{20}\right).$$

Then your objective function is just

$$\text{maximize} \quad \sum_i a_ix_i$$

and the constraints remain as you formulated them originally.

Note that this approach only works because the $x_i$ are binary. If they were continuous (or general integer, for that matter) it would not work.

edited May 18 '20 at 01:03

answered Apr 30 '20 at 14:25

LarrySnyder610

13,141
3
41
105

Hi Larry, I am indeed trying to evaluate the Φ of the sum of each i's values for $p_i$ and $q_i$, not the sum of individual Φ functions. Therefore calculating Φ can't come until we've selected 3 $x_i$ values. And then on top of that we want to maximize the sum of two Φ functions. – Jack Ries May 08 '20 at 03:04
Your formulation says you want to max $\sum_i \Phi(x_i)$ but it sounds like you're saying you actually want to max $\Phi\left(\sum_i x_i\right)$? (I'm omitting coefficients.) – LarrySnyder610 May 08 '20 at 12:56
I think I mis-read your original question of "Why not just calculate these terms for each i and then use the result as the coefficients for $x_i$?" It's because this is a simplified model for a much larger problem where pre-calculating the terms would be unfeasible for how many combinations would need calculating. I'm hoping I can avoid that route. – Jack Ries May 11 '20 at 03:13
As your question is currently written, my suggestion does not require you to pre-calculate terms for combinations of variables, only one per variable. – LarrySnyder610 May 11 '20 at 03:40
Or do you mean that the $x_i$ themselves are really more complicated terms, but you simplified them to $x_i$ for your question? – LarrySnyder610 May 11 '20 at 03:41
Thanks for the quick response Larry. I apologize but I'm following. Perhaps we can use a small dataset from the example I linked in the question (https://imgur.com/a/ISABp). i=5, I = $p_i$, II = $q_i$, Bin = $x_i$, columns G and H are the calculations of $x_ip_i$ and $x_iq_i$. – Jack Ries May 12 '20 at 03:35
I can't really make much sense of the spreadsheet screenshot. Can you just clarify: are you trying to max $\sum_i \Phi(x_i)$ or max $\Phi\left(\sum_i x_i\right)$? – LarrySnyder610 May 12 '20 at 17:59
I'm moreso trying to maximize $$ \sum_i \left[ \Phi\left(x_ip_i\right) + \Phi\left(x_iq_i\right)\right] $$ – Jack Ries May 13 '20 at 22:08
But since $x_i \in {0,1}$, $\Phi(x_ip_i) + \Phi(x_iq_i) = x_i\left(\Phi(p_i) + \Phi(q_i)\right) + (1-x_i)\times 2\Phi(0)$. So I'm suggesting setting $a_i = \Phi(p_i) + \Phi(q_i)$ and then minimizing $\sum_i (a_ix_i - 2\Phi(0))(1-x_i)$. – LarrySnyder610 May 15 '20 at 03:03
(I forgot about the $x_i=0$ case in my answer, but I can fix that if you agree with my logic in the comment above.) – LarrySnyder610 May 15 '20 at 03:04
Ah ha. Sorry about that. You were right, then. I realize now that I want to maximize $\Phi (\sum_i x_ip_i) + \Phi (\sum_i x_iq_i)$ – Jack Ries May 16 '20 at 20:20
OK. Why don’t you edit the original question then so there’s no further confusion. – LarrySnyder610 May 17 '20 at 02:40
Hi Larry, are you able to help solve the re-formatted question? Is it possible to optimize the norm dist of a summation? – Jack Ries May 28 '20 at 03:55
@JackRies Not really. The only other thing I can suggest is that if $p_i$ and $q_i$ are $\ge 0$, then the arguments to $\Phi$ are $\ge 0$, and $\Phi$ is concave in that region. Therefore you are maximizing a concave function (equivalently, minimizing a convex function), so convex integer programming might be useful. – LarrySnyder610 May 29 '20 at 02:30

How can I maximize the sum of norm.dist values using solver?

2 Answers2