8

I have a linear optimization problem for which I am looking for a suitable optimization solution that can fulfill my requirements. Here is an explanation of the optimization problem:

There are a number of n variables x1, x2, .., xn relevant for the optimization. They contain a value between 0 and 100, whereby the sum of the variables is always 100%. The variables are used as multiplication coefficient for their respective vector with m entries.

There are different functions for determining the cost value for the variables depending on the optimization goal. One of them is shown below as a pseudo code:

cost_function(x1, x2, .., xn)
       i = 0
       min_value = 0
       **while** i < m
           e = x0 * E0_i + x1 * E1_i + .. + xn * EN_i
           v = (s0 * E0_i + s1 * E1_i + .. + sn * EN_i) / e
           minimize_value += min(-C_i, e) * (P_i - v)
       **return** minimize_value

where the bold characters represent vectors and the other scalar values.

The diagram shows the state space with three variables x1, x2 and x3, where darker is better:

enter image description here

Currently, I use Simulated Annealing for this optimization problem, but the state s' is only accepted as new state if it is better than s, because it would be counterproductive to have a probability to take over worse states in a linear optimization problem. From a scientific point of view, the use of Simulated Annealing is probably not appropriate for the application.

Would linear programming still be recommended in this case? Or is there an alternative to Simulated Annealing for linear optimization problems?


EDIT:

A minimum working example of the problem can be found here

The result of the variables should be: [0.17023975 0.19532928 0.64296507]

Emma
  • 382
  • 1
  • 11
  • Are s0...sn and C constants? Is v in line 9 computed using the i-th component of each E vector (as written, v would be a vector, unless the si were vectors and * meant inner product)? Are the P_i constants? – prubin May 23 '20 at 17:26
  • @Emma, what exactly do you mean by linear programming? What you mentioned is a metaheuristic and is quite different from exact methods like LP. If you are interested to use LP/MIP, you should try writing your problem in those formats and ask your question again if you would have any issues. :) – A.Omidi May 24 '20 at 09:11
  • @prubin Yes, s0..sn are constant scalars and C is a constant vector. The operator * in line 9 means the multiplication of a scalar by a vector. P_i are constants. Sorry for the confusion. – Emma May 24 '20 at 09:36
  • 1
    Do you have a minimum working example of the problem implemented in a way you could upload? – Richard May 27 '20 at 20:05
  • @Richard Thanks for your interest. I have now added a link to the minimum working example. – Emma May 28 '20 at 12:37
  • @Emma: I'm not sure I understand this: if C_ges > E_ges. Is it possible to rewrite the cost function without the conditional? This kind of discontinuity will cause you pain. Where are you getting your cost functions from/what do they mean? – Richard May 28 '20 at 17:36
  • @Richard I have tried to simplify the function as much as possible. But now a min-function is included, which selects the element with the smaller decimal number. That probably doesn't help, right? – Emma May 28 '20 at 19:17
  • 1
    @Emma: The minimum is actually quite helpful, because it allows you to rewrite min(C_i, e) * (P_i - v) as e * (P_i - v) with the constraint that e<=C_i. This gets you close to having a quadratic program, which you can solve efficiently and exactly. However, looking at the values of C[i] in your example and the constraint that x>=0, it will always be the case that min(C_i,e)=C_i. Are you sure this objective function is correct? – Richard May 29 '20 at 06:34
  • @Richard It should be -C_i, you're right. I have edited it, but apart from that I get the same results for the simplification as for the old objective function, so it should be correct. Does this help us? – Emma May 29 '20 at 14:18

2 Answers2

14

First, the problem is not a linear optimization problem, at least not for the objective function shown (which is nonlinear due the conditional portion in lines 10-13 and particularly the division by E_ges_i in line 13. Simulated annealing might be fine as a heuristic approach, but given the nonlinear objective, only accepting improving steps might or might not be appropriate. If the objective is unimodal, I think skipping steps that make the objective worse might be okay. Then again, if the heat map is unimodal, constrained gradient descent might work just as well or better.

Sticking to the objective function shown, another possibility would be to treat the problem as a mixed integer linear program, using a piecewise linear approximation to the objective function. Since an approximation is involved, you might want to try gradient descent or other local search from the model's solution to see if you can improve it, or alternatively refine the approximation near the model solution and solve the revised model (ad nauseum).

ADDENDUM: I just hacked a little R code, using the objective function posted on GitHub. The algorithm I used is the Nelder-Mead polytope algorithm, as described in 1. The Nelder-Mead algorithm, published in 1965, was a modification of an earlier algorithm (by Spendley, Hext and Himsworth in 1962, according to 1) and has likely been improved upon in turn since the '60s. The attraction of the Nelder-Mead algorithm (or its relatives) here is that it starts with a simplex as the search region and progressively shrinks to smaller and smaller simplices, all of which are subsets of the original. So if we use N-M starting with the unit simplex, the constraints are always satisfied.

I did not put in much care in the coding, did not code all the possible adjustments (such as random restarts), and picked the expansion, contraction and reflection parameters out of thin air. That said, it converged (or at least stopped) after seven iterations on the test problem, with a solution of (0.0126960, 0.2461336, 0.7411704) and an objective value of -4675913 -- not optimal, but I think not too shabby (particularly since I did not code restarts or other more recent tweaks to Nelder-Mead). It's not hard to code, and it does not require derivatives, just function evaluations.

1 P. Gill, W. Murray and M. Wright, Practical Optimization (Academic Press, 1981).

ADDENDUM 2 I updated my R code to use the simplified version of the cost function from the GitHub repository (which is a bit better behaved when an argument is zero). I also switched from the Nelder-Mead algorithm as present in Gill, Murray and Wright to the version on the Wikipedia page, and adjusted the parameter values to the ones they recommend. I have to qualify my earlier comment about Nelder-Mead automatically maintaining feasibility. The condition that the weights sum to 1 is automatically maintained. Nonnegativity of the weights requires occasionally shrink a proposed step, which is easily handled.

With the modified code and the parameter values from the Wikipedia page, I get a final solution of (0, 0.2885719, 0.7114281) with an objective value of -4,683,095.

ADDENDUM 3 Hopefully this will be my final addendum. :-) I also tried a genetic algorithm (in R). The solution from a GA is inherently random, but with the parameters and random seed that I chose I got a final solution of (0.001182725, 0.2869361, 0.7118812) with objective value -4,683,175, which is slightly better than both what I got with Nelder-Mead and what LocalSolver reported.

My R code for both Nelder-Mead and the GA (using the genalg R library) are available in an R notebook.

prubin
  • 39,078
  • 3
  • 37
  • 104
  • Many thanks for this helpful and interesting answer. Today I solved the optimization problem with a solver based on Sequential Least Squares Programming (SLSQP) and got the same results for all objective functions as with Simulated Annealing. Do you also consider the SLSQP approach to be appropriate? – Emma May 25 '20 at 15:28
  • I'm not familiar with SLSQP, so I can't comment on it. – prubin May 26 '20 at 16:33
  • 1
    It should be alright, SLSQP works quite well for local optimization of nonlinear constrained problems. Nonetheless, I'd also consider @prubin's proposal of using constrained gradient descent, as it should be faster than solving a sequence of QP's. I might be biased: tend to prefer - from all approaches that work - the one that's simpler or less restrictive. Emma, here and here you can find info about SLSQP (how it works, requirements and check when it can be used). – dhasson May 27 '20 at 01:05
  • @dhasson Thank you for the additional information, I have created a bounty that I would like to give away if you help me with the constrained gradient descent implementation. – Emma May 27 '20 at 18:20
  • 1
    @Emma you're welcome. Do you refer to gradient descent pseudocode (as the function in your question) or implementation in a particular language? – dhasson May 27 '20 at 18:41
  • @dhasson An implementation in Python would be ideal, but also a pseudo code would be very helpful. There is now also a minimal working example. – Emma May 28 '20 at 12:38
  • 1
    @LocalSolver The comment above doesn't belong on prubin's answer. It can be a comment on the question (rather than on an answer), but since you are essentially answering the question, it's really better if you post it as a new answer, not a comment. Please delete your comment and repost as a comment on the question, or as a separate answer. – LarrySnyder610 May 29 '20 at 02:27
  • 1
    FYI. Gill, Murray, Wright "Practical Optimization" has recently been reprinted by SIAM https://my.siam.org/Store/Product/viewproduct/?ProductId=31205265 . – Mark L. Stone May 29 '20 at 12:25
  • @prubin Thank you for the advice to use the Nelder-Mead algorithm. With my python implementation with constrNMPy I could achieve an objective value of -4682972 with (0.00353171, 0.28471466, 0.71175363). Would it then be okay to write in my scientific thesis that the constraints are linear, but since the objective function is nonlinear (for the reasons you explained), algorithms for nonlinear problems have been tested based on mathematical optimization, and for these the Nelder-Mead algorithm proved to be the most effective? – Emma May 29 '20 at 15:47
  • @prubin Great that you were able to further improve the results and thank you very much for your invested time :) If you could write a small paragraph about how you would justify the use of the Nelder-Mead algorithm in a scientific work against the more elementary algorithms mixed integer linear program and gradient descent, I would like to give you the bounty. – Emma May 30 '20 at 08:18
  • I think Nelder-Mead would generally be considered more basic/less sophisticated than gradient descent. I don't see how integer programming factors in here, since the variables are not integer-valued. The virtue of Nelder-Mead, to me, is that it's easier to implement than gradient descent. In both, you have to be careful to pick feasible step sizes, but in gradient descent you also have to project the gradient onto the hyperplane to preserve feasibility (whereas Nelder-Mead automatically stays in the domain hyperplane). – prubin May 30 '20 at 13:35
  • As far as what would work in a scientific paper, that depends on the nature of the paper (and where it is going), whether a near-optimal but not optimal solution is good enough, and maybe some other factors. One more thing: I'm happy to share my R code, although it does not seem to do much better than constrNMPy did. – prubin May 30 '20 at 13:37
  • @prubin Thank you very much for all the help. I would be very happy to receive your R code. For this I invited you to the repository so you can upload the file there if it is too large to be posted here. – Emma May 30 '20 at 16:56
  • 1
    @Emma I just updated my answer with results from a GA (even better) and a link to my R code. If you have questions about it, the home page for site where the R code exists has my contact info on it. – prubin May 31 '20 at 20:42
  • 1
    You can get a solution with cost -4683181 using LocalSolver, just running it a little longer (a few seconds instead of 1 second). Our answer below has been updated accordingly. – Hexaly Jun 01 '20 at 14:30
  • Wow this question is a real show ! – Kuifje Jun 01 '20 at 14:32
6

If you want to implement an algorithm on your own, then we suggest a randomized, derivative-free search, even simpler than a Nelder-Mead approach. Given a feasible solution (respecting the sum equal to 1), move randomly the values of the variables by an epsilon while maintaining the constraint feasible. If the solution is better, keep it; otherwise, throw it away. Start with this simple approach. To go further, tune how you choose the epsilons to move, accept less good solutions along the search to diversify (as done in simulated annealing), and restart the search.

Hexaly (ex LocalSolver), our global optimization solver, combines several optimization techniques under the hood. Here, the above essentially allows Hexaly to perform very well on your problem. Thanks to the small number of dimensions (only 3 variables), there is no need to use derivatives (even approximated) to guide and speed up the search. In the same way, there is no need for surrogate modeling of the cost function because this one is extremely fast to evaluate (about 10,000 calls per second).

Disclaimer: Hexaly is commercial software. You can try it for free for 1 month. In addition, Hexaly is free for basic research and teaching.

Please find below the results obtained by Hexaly using your cost function as external function:

function model() {
    X[0..2] <- float(0,1);
    constraint sum[i in 0..2](X[i]) == 1;
    func <- doubleExternalFunction(cost);
    obj <- call(func, X);
    minimize obj;
}

Having declared the cost function, Hexaly solves the problem as is. Here "solve" means that Hexaly will try to find the best feasible solution to the problem. You can also specify lower and upper bounds for the cost function so that Hexaly computes an optimality gap and then possibly proves the optimality of the solution found.

You can write your model using the Hexaly modeling language (namely LSP) or using Python, Java, C#, or C++ APIs. Here is the link to download the LSP file: https://www.hexaly.com/misc/emma.lsp. Having installed Hexaly, you can run it using the command "hexaly emma.lsp" in console. The best solution found by Hexaly after a few seconds on a basic laptop is:

cost = -4683181.09020784, X0 = 0.00106356929433748, X1 = 0.287235884100486, X2 = 0.711702039130129

The sum over the X equals 1.00000149252495, slightly above 1, because Hexaly uses a tolerance to satisfy constraints. If you wish the sum over the X to be surely lower than 1, then you can set "< 1" in the above model instead of "== 1". In this case, you will find the following solution:

cost = -4683175.50600612, X0 = 0.00111513425966878, X1 = 0.286966585180356, X2 = 0.711915927974678

Now, the sum over the X is equal to 0.999997647414703.

enter image description here

Hexaly
  • 2,976
  • 1
  • 11
  • 17
  • 5
    I think it would be constructive if downvoters explained why they down voted : is this solution not mathematically relevant ? Or is too much of a "black box" approach, where the black box is a commercial software ? – Kuifje May 28 '20 at 07:23
  • 1
    LocalSolver, it's interesting your program found a better solution than @prubin's and Emma's approaches. I wonder how running time and function evaluations compare in the same machine (as it executed 10k+ iterations)? – dhasson May 29 '20 at 02:40
  • 3
    @dhasson The results were obtained on a basic laptop. You can reproduce the results on your machine if you are interested in. Here the cost function to is extremely fast to evaluate (about 10,000 evaluations per second). Then a randomized derivative-free search, with all the stuff to diversify and then escape local optima, will perform very well. By experience, it keep on working well with slower functions, with a few evaluations per second. If much slower (a few seconds or more per evaluation), then surrogate modeling of the cost function becomes interesting to avoid calling it too much. – Hexaly May 29 '20 at 09:24
  • 2
    @LocalSolver: In my opinion, your paragraph "If you want to implement an algorithm by your own..." is probably what the OP was expecting, as you explain an algorithmic strategy. I would suggest putting this paragraph at the way top. And then, illustrate your point with the example implemented in LocalSolver. I think the answer provided as is is of good quality! – Kuifje May 29 '20 at 10:06
  • @Kuifje Thanks a lot for your constructive comment. We correct the post accordingly. Hope that this will be more helpful like this. – Hexaly May 29 '20 at 10:23