Maximize the minimal distance between true variables in a list

Question

I'm using the OR-Tools CP-SAT solver on a list of $n$ boolean variables $x_i$. I'm trying to maximize the minimal distance between two true variables in this list, as illustrated by the following figure.

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
           <-------------> <---------> <----------------->
                  4             3               5
                            (minimum)

In other words, mathematically, I'm trying to maximize the expression : $$\min(j-i \mid 0 < i < j < n, x_i = x_j = 1)$$

At the moment, I'm using this algorithm, basically brute-forcing all the possible intervals:

minimalDistance = model.NewIntVar(0, n);
for (k = 1; k < n; ++k) {
    isIntervalAtLeastOfGivenSize[k] = model.NewBoolVar();
    model.AddEquivalence(
        isIntervalAtLeastOfGivenSize[k],
        minimalDistance >= k
    );
}
for (i = 0; i < n; ++i) {
    for (j = i + 1; j < n; ++j) {
        model.AddImplication(
            x[i] and isIntervalAtLeastOfGivenSize[j - i + 1],
            x[j] = false
        )
    }
}
model.maximize(minimalDistance)
model.solve()

It works, but I have a feeling that it's not the best approach: it adds a lot of constraints, and it doesn't scale well when $n$ gets bigger. Is there a better way to do it?

Would you say, is the problem to find the sort of distance that being maximized? or there is another problem that its results should be sorted? I mean the list is pre-defined and we would like to sort that!? If it is the first one, how the number of ones is determined? — A.Omidi, Nov 06 '22 at 13:12
@A.Omidi This problem is part of a larger model; the $x_i$ are determined by the solver, with the maximization I'm asking about here and additional constraints. — Blackhole, Nov 06 '22 at 18:53

RobPratt · Answer 1 · 2022-11-02T18:55:02.820

6

You can maximize $z$ subject to linear big-M constraints $$z - (j-i) \le M_{ij} (2 - x_i - x_j),$$ where $M_{ij} = n-(j-i)$. Each such constraint enforces the logical implication $(x_i \land x_j) \implies z \le j - i$.

If you also know a lower bound $\sum_i x_i \ge k$ for some $k > 1$, you can impose a valid constraint $$z \le \left\lfloor\frac{n-1}{k-1}\right\rfloor$$ that dominates some of the other constraints.

edited Nov 02 '22 at 18:55

answered Nov 02 '22 at 18:24

RobPratt

32,006
1
44
84

2

Since Blackhole is using the CP-SAT solver, it might be easier to enter the constraints as $(x_i \land x_j) \implies z \le j - i$ for all $i<j.$ – prubin Nov 02 '22 at 18:30
Thanks for your answer, Rob. Yep, I'm using a CP solver, not a LP solver. Your proposal removes the need for my intermediate isIntervalAtLeastOfGivenSize variables, but unless I misunderstand, it still adds n²/2 constraints as well. Would it makes the problem easier for the solver nevertheless? (sorry, I'm still a novice in OR). – Blackhole Nov 02 '22 at 18:37
Yes, there are $O(n^2)$ constraints. Not sure how it will perform, but it is worth trying. – RobPratt Nov 02 '22 at 18:39
1

The constraint will be quadratic when encoded. Rob's suggestion with Paul's syntax seems the most promising approach. – Laurent Perron Nov 03 '22 at 09:37
I've just tried it, it works well! The performance gain is not obvious, but it's definitely a cleaner solution than mine, so it's adopted. Thanks for the help! I'll wait a bit to see if anyone else has another answer to propose, and accept yours after a few days if not. – Blackhole Nov 03 '22 at 16:59
Glad to help. Do you also know a lower bound like $\sum_i x_i \ge k$? – RobPratt Nov 03 '22 at 17:27
Sadly, no, I don't. It would indeed have been a good way to reduce the number of constraints otherwise. – Blackhole Nov 03 '22 at 20:45
@LaurentPerron, RobPratt: do you have any opinion on mtanneau's suggestion? While there are $O(n)$ constraints, there are also $n$ integer variables, each with $n$ possible values. As I was saying, I don't know much about the internal working of the solver; is one solution inherently better than the other? – Blackhole Nov 05 '22 at 18:31
1

The formulation by @mtanneau looks promising. You could optionally relax integrality of $s_i$ and relax some equalities to $\le$ inequalities (because you are maximizing). – RobPratt Nov 05 '22 at 18:44
1

I like differential encodings. Which one is a better can only be validated by experiments. – Laurent Perron Nov 06 '22 at 08:46

score 5 · Accepted Answer · edited Nov 05 '22 at 12:56

I have no idea (i) whether the following can be encoded in a CP solver and (ii) how efficient it would be. On the pro side, it only requires a linear number of variables and constraints.

Say you have $n$ boolean variables $x_{0}, ..., x_{n-1}$. You can introduce an integer-valued counter $s_{0}, ..., s_{n-1}$ such that $$ s_{0} = \left\{ \begin{array}{ll} n & \text{ if } x_{0} = 0\\ 0 & \text{ if } x_{0} = 1 \end{array} \right. \qquad s_{i} = \left\{ \begin{array}{ll} n & \text{ if } x_{i} = 0 \land s_{i-1}=n\\ s_{i-1} + 1 & \text{ if } x_{i} = 0\land s_{i-1} \neq n\\ 0 & \text{ if } x_{i} = 1 \end{array} \right. $$ This counter starts from $n$, is incremented as long as $x$ remains zero and if it's not equal to $n$, and is reset to $0$ every time $x$ takes value one.

From there, we can just maximize $z$, subject to the additional constraints :

$$x_i = 1 \Rightarrow z \leq s_{i-i}$$

For the example you gave in the original question, this would yield something like

  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
x | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
s | n | n | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
            ↑               ↑           ↑                   ↑
          z ≤ n           z ≤ 3       z ≤ 2               z ≤ 4
        (trivial)                   (stronger)

The counter one scales better. As for the one suggested by Dr. Rob, Hi Blackhole, what are the x values? I mean how many are true for an array of size 14? I am using Gurobi — Sutanu Majumdar, Nov 05 '22 at 15:30
@Sutanu Yes, it seems to perform better. The typical data varies widely, but I'll say the average would be around 25% of true value in the list. — Blackhole, Nov 05 '22 at 18:17

Gino · Answer 3 · 2022-11-03T14:04:30.060

1

Probably I'm not understanding correctly the problem, but to me the solution looks like

(n - X) / (X - 1)

with X being the number of 1 (True values) in the array of lenght n

Suppose you have an array of 9 variables. 3 True, 6 False: the above formula gives you:

(9 - 3) / (3 - 1) = 3

In fact you can arrange your 9 variable this way:

1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1

edited Nov 03 '22 at 14:04

answered Nov 03 '22 at 13:56

Gino

11
2

Thanks for your answer, Gino. I can't choose the value of my variables freely, there are other constraints beside the ones I'm talking about here. – Blackhole Nov 03 '22 at 14:46

Maximize the minimal distance between true variables in a list

3 Answers3