5

Consider the simple problem of maximum likelihood estimation of the variance of a mean zero normal distribution. The expression to be minimised is: $$N \log{v}+\frac{1}{v}\sum_{n=1}^N{b_n^2},$$ where $v$ is the unknown variance, and $b_1,\dots,b_N$ are the known observations.

Obviously, this problem is trivial, but similar expressions appear in non-trivial problems, such as estimating the variance of shocks in a linear-Gaussian state space model. And in those other problems substituting in the optimal $v$ is less feasible.

Simplifying even further, the question is the following:

Is there a global optimizer (ideally supported by YALMIP) capable of finding the global minimum ($1$) of the function: $$\log{v}+\frac{1}{v}$$ for $v\in [c,d]$, where $c\in(0,1)$ and $d\in(1,2)$, to ensure convexity. (Of course, any local optimizer like fmincon will happily find the optimum, but while we know it is the global optimum in this trivial problem, in less trivial problems we do not.)

Perhaps there is a trick for rewriting the problem?


Update: I have moved the multivariate version of this question to here.


For the record, here is YALMIP's output on the original simple problem:

>> v=sdpvar(1,1);
>> optimize([v>=0.9,v<=1.1],log(v)+1/v,sdpsettings('solver','mosek','verbose',1))
Warning: Solver not applicable (mosek does not support exponential cones)

Of course Mosek does support exponential cones, so this error message is wrong! I expect it's just the convexity issue. There's no problem with my install. yalmiptest produces:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|                                     Test|    Status|    Solver|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|                     Core functionalities|   Success|          |
|                  Linear programming (LP)|   Success|     MOSEK|
|               Quadratic programming (QP)|   Success|     MOSEK|
|     Second-order cone programming (SOCP)|   Success|     MOSEK|
|           Semidefinite programming (SDP)|   Success|     MOSEK|
|               Geometric programming (GP)|   Success|     MOSEK|
|              Nonlinear programming (NLP)|   Success|   FMINCON|
|                    Nonlinear SDP (NLSDP)|   Success|   FMINCON|
|       Exponential cone programming (ECP)|   Success|     MOSEK|
|                  Mixed-integer LP (MIQP)|   Success|     MOSEK|
|                  Mixed-integer QP (MIQP)|   Success|     MOSEK|
|              Mixed-integer SOCP (MISOCP)|   Success|     MOSEK|
|   Global nonconvex quadratic programming|   Success|    BMIBNB|
|             Global nonconvex programming|   Success|    BMIBNB|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
cfp
  • 259
  • 1
  • 8
  • 2
    SCIP can do it, and is supported by YALMIP, from what I can see. You just need to reformulate the problem so that the nonlinear objective appears as a constraint. In your case you would have min z, such that z >= log(v) + 1/v. – J. Dionisio Jan 24 '24 at 14:52
  • 1
    I didn't manage to get SCIP to do this via YALMIP, but I'll consider using it directly. – cfp Jan 25 '24 at 12:27
  • SCIP also has some interfaces of its own, if it makes it easier. PySCIPOpt (Python), RusSCIP (Rust), JSCIPOpt (Java), among others. You can find the several repositories here: https://github.com/scipopt – J. Dionisio Jan 25 '24 at 12:55
  • @j-dionisio Might SCIP be able to handle the non-trivial problem I describe in my reply to Henrik below? – cfp Jan 25 '24 at 12:57
  • I'm not into probability much, so I can only say that it depends on how you are going to estimate s1, s2,..., I'd say. You can find in the following link the type of problems that SCIP can solve https://www.scipopt.org/doc/html/WHATPROBLEMS.php – J. Dionisio Jan 25 '24 at 13:03
  • f.*f.*exp(-u) is neither convex nor concave, if expressed correctly, log(sum(detA2.*detD)) would be Disciplined Concave, but certainly not convex. So your YALMIP code for "But this does not seem to work in practice:" is nowhere close to being a convex problem which Mosek could handle. – Mark L. Stone Jan 26 '24 at 15:29
  • @MarkL.Stone Isn't the log sum exp term convex, not concave? Standard logsumexp is convex I thought. Do you have a suggestion for a better reformulation of the second term? – cfp Jan 26 '24 at 21:08
  • Your 2nd term as written is not inside the log. But even were, it has one variable squared times exponential of another variable. That is neither convex nor concave, – Mark L. Stone Jan 26 '24 at 23:07
  • @MarkL.Stone You previously said the first term was concave. I was saying it was convex. Independently, I was asking you for a better reformulation of the second term from the equation, which does not include the f variable. – cfp Jan 27 '24 at 00:30
  • @MarkL.Stone The multivariate version of this question is now here: https://or.stackexchange.com/questions/11588/global-optimizers-handling-minimization-of-an-expression-arising-from-the-likeli I would appreciate your thoughts! – cfp Jan 31 '24 at 08:40

2 Answers2

5

The logarithm is not a convex function, so even though the expression as a whole is convex on some subinterval $v \in [c,d]$, the expression is not in disciplined convex form which might be why yalmip is failing (the error message looks like a software bug).

If you want to cast it as disciplined convex program - which I would always recommend if possible - you will have to either transform the objective function somehow (any monotonic increasing mapping will preserve the global optimum), or make a variable substitution. One example of the latter is to substitute $v := \exp(u)$ such that

$$N \log{v}+\frac{1}{v}\sum_{n=1}^N{b_n^2} = N u + \exp(-u)\sum_{n=1}^N{b_n^2},$$

which is a disciplined convex expression in $u$. Will this work for your less trivial problems, or what exactly are different?

cfp
  • 259
  • 1
  • 8
  • For an example of a non-trivial problem, suppose e~N(0,I), and x=A e, where A=A0+A1 s1+A2 s2+...+An sn. We observe x (not e) and we wish to estimate s1, s2, ... sn via maximum likelihood A0, A1, ..., An are known. So the objective is log(det(A A'))+x' inv(A A') x, or equivalently log(det(A A'))+e' e subject to x=A e. Reformulating in terms of the matrix logarithm of A A' seems unlikely to work nicely! – cfp Jan 25 '24 at 12:35
  • 1
    I would not dismiss the idea so quickly. If A A' = V = exp(U), then you have log(det(A A')) = log(det(exp(U))) = Tr(U). Without having thought it through, I find it plausible that this reformulation could turn out nicely. You just need a way to link A and U. – Henrik Alsing Friberg Jan 26 '24 at 11:16
  • I have edited the question to give the multivariate problem in sufficient generality. (The comment above was actually slightly too general.) Any help on the multivariate problem would be appreciated! – cfp Jan 26 '24 at 12:40
  • The multivariate version of this question is now here: https://or.stackexchange.com/questions/11588/global-optimizers-handling-minimization-of-an-expression-arising-from-the-likeli – cfp Jan 29 '24 at 18:12
0

There are various ways we do this in practice. For instance, assuming we treat $v$ as a variable (and not as a function), $1/v$ is piecewise convex/concave so it can be relaxed using an outer approximation and a secant. The logarithm is concave so we can use a secant. We then do branch-and-bound to find the global minimum of the expression, tightening the relaxations in every new node. We can also use various acceleration techniques such as adding cuts, domain reduction, and so on. If there are other expressions are present, sometimes we will combine terms/functions to get more favourable forms.

If $v$ is a function, we can use factorable decomposition where we set $c_1: w_1=v$, $c_2: w_2=1/v$ and then e.g. do $\log(w_1)+w_2$, subject to $c_1,c_2$.

If we have more information the solver can choose to exploit more things, e.g. if $v>0$ (or $w_2>0$ for a function) we don't need to represent the concave part for the fraction, or we can use a nonlinear relaxation for it. Since it's convex in that domain, the function is its own relaxation.

Any global solver you use will most likely do any one of these things (or something else in the solver's toolbox), some of them better than others.

Nikos Kazazakis
  • 12,121
  • 17
  • 59