5

I'm using the OR-Tools CP-SAT solver on a list of $n$ boolean variables $x_i$. I'm trying to maximize the minimal distance between two true variables in this list, as illustrated by the following figure.

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
           <-------------> <---------> <----------------->
                  4             3               5
                            (minimum)

In other words, mathematically, I'm trying to maximize the expression : $$\min(j-i \mid 0 < i < j < n, x_i = x_j = 1)$$

At the moment, I'm using this algorithm, basically brute-forcing all the possible intervals:

minimalDistance = model.NewIntVar(0, n);
for (k = 1; k < n; ++k) {
    isIntervalAtLeastOfGivenSize[k] = model.NewBoolVar();
    model.AddEquivalence(
        isIntervalAtLeastOfGivenSize[k],
        minimalDistance >= k
    );
}

for (i = 0; i < n; ++i) { for (j = i + 1; j < n; ++j) { model.AddImplication( x[i] and isIntervalAtLeastOfGivenSize[j - i + 1], x[j] = false ) } }

model.maximize(minimalDistance) model.solve()

It works, but I have a feeling that it's not the best approach: it adds a lot of constraints, and it doesn't scale well when $n$ gets bigger. Is there a better way to do it?

Laurent Perron
  • 2,690
  • 1
  • 6
  • 18
Blackhole
  • 235
  • 1
  • 6
  • Would you say, is the problem to find the sort of distance that being maximized? or there is another problem that its results should be sorted? I mean the list is pre-defined and we would like to sort that!? If it is the first one, how the number of ones is determined? – A.Omidi Nov 06 '22 at 13:12
  • 1
    @A.Omidi This problem is part of a larger model; the $x_i$ are determined by the solver, with the maximization I'm asking about here and additional constraints. – Blackhole Nov 06 '22 at 18:53

3 Answers3

6

You can maximize $z$ subject to linear big-M constraints $$z - (j-i) \le M_{ij} (2 - x_i - x_j),$$ where $M_{ij} = n-(j-i)$. Each such constraint enforces the logical implication $(x_i \land x_j) \implies z \le j - i$.

If you also know a lower bound $\sum_i x_i \ge k$ for some $k > 1$, you can impose a valid constraint $$z \le \left\lfloor\frac{n-1}{k-1}\right\rfloor$$ that dominates some of the other constraints.

RobPratt
  • 32,006
  • 1
  • 44
  • 84
  • 2
    Since Blackhole is using the CP-SAT solver, it might be easier to enter the constraints as $(x_i \land x_j) \implies z \le j - i$ for all $i<j.$ – prubin Nov 02 '22 at 18:30
  • Thanks for your answer, Rob. Yep, I'm using a CP solver, not a LP solver. Your proposal removes the need for my intermediate isIntervalAtLeastOfGivenSize variables, but unless I misunderstand, it still adds n²/2 constraints as well. Would it makes the problem easier for the solver nevertheless? (sorry, I'm still a novice in OR). – Blackhole Nov 02 '22 at 18:37
  • Yes, there are $O(n^2)$ constraints. Not sure how it will perform, but it is worth trying. – RobPratt Nov 02 '22 at 18:39
  • 1
    The constraint will be quadratic when encoded. Rob's suggestion with Paul's syntax seems the most promising approach. – Laurent Perron Nov 03 '22 at 09:37
  • I've just tried it, it works well! The performance gain is not obvious, but it's definitely a cleaner solution than mine, so it's adopted. Thanks for the help! I'll wait a bit to see if anyone else has another answer to propose, and accept yours after a few days if not. – Blackhole Nov 03 '22 at 16:59
  • Glad to help. Do you also know a lower bound like $\sum_i x_i \ge k$? – RobPratt Nov 03 '22 at 17:27
  • Sadly, no, I don't. It would indeed have been a good way to reduce the number of constraints otherwise. – Blackhole Nov 03 '22 at 20:45
  • @LaurentPerron, RobPratt: do you have any opinion on mtanneau's suggestion? While there are $O(n)$ constraints, there are also $n$ integer variables, each with $n$ possible values. As I was saying, I don't know much about the internal working of the solver; is one solution inherently better than the other? – Blackhole Nov 05 '22 at 18:31
  • 1
    The formulation by @mtanneau looks promising. You could optionally relax integrality of $s_i$ and relax some equalities to $\le$ inequalities (because you are maximizing). – RobPratt Nov 05 '22 at 18:44
  • 1
    I like differential encodings. Which one is a better can only be validated by experiments. – Laurent Perron Nov 06 '22 at 08:46
5

I have no idea (i) whether the following can be encoded in a CP solver and (ii) how efficient it would be. On the pro side, it only requires a linear number of variables and constraints.

Say you have $n$ boolean variables $x_{0}, ..., x_{n-1}$. You can introduce an integer-valued counter $s_{0}, ..., s_{n-1}$ such that $$ s_{0} = \left\{ \begin{array}{ll} n & \text{ if } x_{0} = 0\\ 0 & \text{ if } x_{0} = 1 \end{array} \right. \qquad s_{i} = \left\{ \begin{array}{ll} n & \text{ if } x_{i} = 0 \land s_{i-1}=n\\ s_{i-1} + 1 & \text{ if } x_{i} = 0\land s_{i-1} \neq n\\ 0 & \text{ if } x_{i} = 1 \end{array} \right. $$ This counter starts from $n$, is incremented as long as $x$ remains zero and if it's not equal to $n$, and is reset to $0$ every time $x$ takes value one.

From there, we can just maximize $z$, subject to the additional constraints :

$$x_i = 1 \Rightarrow z \leq s_{i-i}$$

For the example you gave in the original question, this would yield something like

  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
x | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
s | n | n | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
            ↑               ↑           ↑                   ↑
          z ≤ n           z ≤ 3       z ≤ 2               z ≤ 4
        (trivial)                   (stronger)
Blackhole
  • 235
  • 1
  • 6
mtanneau
  • 4,123
  • 11
  • 24
  • Right, I read too fast. Thanks for the edit! – mtanneau Nov 05 '22 at 14:31
  • The counter one scales better. As for the one suggested by Dr. Rob, Hi Blackhole, what are the x values? I mean how many are true for an array of size 14? I am using Gurobi – Sutanu Majumdar Nov 05 '22 at 15:30
  • @Sutanu Yes, it seems to perform better. The typical data varies widely, but I'll say the average would be around 25% of true value in the list. – Blackhole Nov 05 '22 at 18:17
1

Probably I'm not understanding correctly the problem, but to me the solution looks like

(n - X) / (X - 1)

with X being the number of 1 (True values) in the array of lenght n

Suppose you have an array of 9 variables. 3 True, 6 False: the above formula gives you:

(9 - 3) / (3 - 1) = 3

In fact you can arrange your 9 variable this way:

1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1
Gino
  • 11
  • 2
  • Thanks for your answer, Gino. I can't choose the value of my variables freely, there are other constraints beside the ones I'm talking about here. – Blackhole Nov 03 '22 at 14:46