3

I have a big LP program with around $91,000$ variables and $2,900,000$ constraints. They are all binary variables, but I want to try also relaxing the problem putting $0 \le x \le 1$ bounds. I am not very knowledgeable on the subject because I took only one university course on operations research about 30 years ago and didn't use it afterwards, so I forgot nearly everything.

The problem is aimed at trying to prove the union closed sets conjecture for a given fixed number of sets. I have formulated it implementing this answer and this answer.

The formulation is:

$$\max \sum_{i=1}^n a_{1,i} \tag1\label1$$

subject to:

$$\sum_{i=1}^n a_{k,i} \ge \lfloor n/2 \rfloor + 1, \space 2 \le k \le q \tag2\label2$$ $$a_{k,\ell} \le a_{k,i} + 1 - z_{i,j,\ell}, \space 1 \le k \le q, 1 \le i \lt j \le n, 1 \le \ell \le n \tag3\label3$$ $$a_{k,\ell} \le a_{k,j} + 1 - z_{i,j, \ell}, \space 1 \le k \le q, 1 \le i \lt j \le n, 1 \le \ell \le n \tag4\label4$$ $$a_{k,\ell} \ge a_{k,i} + a_{k,j} + z_{i,j,\ell} - 2, \space 1 \le k \le q, 1 \le i \lt j \le n, 1 \le \ell \le n \tag5\label5$$ $$\sum_{\ell=1}^n z_{i,j,\ell} = 1, \space 1 \le i \lt j \le n \tag6\label6$$ $$x_{k,i,j} \le a_{k,i} + a_{k,j} \le 2-x_{k,i,j}, \space 1 \le k \le q, 1 \le i \lt j \le n \tag7\label7$$ $$\sum_{k=1}^q x_{k,i,j} \ge 1, \space 1 \le k \le q, 1 \le i \lt j \le n \tag8\label8$$

with all binary values. Some of the $\eqref{3}$, $\eqref{4}$, $\eqref{5}$, can be eliminated or simplified when $\ell = i$ or $\ell = j$. There are many alternatives for $\eqref{1}$ and $\eqref{2}$, for example we can order the occurrences of elements from element $1$ to $q$ and then again search the maximum occurrence of element $1$.

This is actually the dual formulation of the union closed sets conjecture, because we are considering intersection-closed families of sets (the intersection of any two sets must still belong to the family), and if the conjecture is true there must exist at least one element in at most $\lfloor n/2 \rfloor$ sets.

The matrix with elements $a_{i,j}$ represent the family of sets: a column is one set, while a row represents an element (well, there can be two or more identical rows, in which case we can think they represent a single element). There are $n$ columns (the given fixed number of sets) and $q$ rows (the maximum number of elements).

Therefore $\eqref{1}$ checks the occurrences of an element (element 1) that we expect to be at most $\lfloor n/2 \rfloor$. Constraint $\eqref{2}$ forces all the other elements to be in more than $\lfloor n/2 \rfloor$ sets. $\eqref{3}$, $\eqref{4}$, $\eqref{5}$ and $\eqref{6}$ require that the family be intersection closed (the intersection of any two sets [columns] must still belong to the family [matrix]) and finally $\eqref{7}$ and $\eqref{8}$ forces all sets to be different. Note that we can also use these last two constraints, needed by the problem, that enforce all sets to be different, for the purpose of replacing $=$ with $\ge$ in $\eqref{6}$.

I have written a smaller version of the program ($n=11$, $q=4$) with LP file format and submitted it successfully to the NEOS server. At the moment I am producing the LP files through a specific C program for my problem.

However, the "big" case, with $n=53$ and $q=13$, has an LP file of size $130$ MByte. NEOS server accepts a maximum of $16$ MByte.

The conjecture has been proven for $n \lt 53$, so $n = 53$ is the lowest number of sets for which a proof is missing, as of today. Note that there is a theorem that limits the maximum number of elements of a counterexample to $13$ for $n = 53$ (the minimal counterexample must have $n \ge 4q+1$, where $n$ is the number of sets and $q$ the number of elements).

Any suggestion for a file format which is both concise in size and quick to learn? Also, are $91,000$ variables and $2,900,000$ constraints too much for the NEOS server?

I have run cases $n=23, q=6$ and $n=29, q=7$ at NEOS with Gurobi, but it timed out after 8 hours.

Any other suggestion for attempting the $n=53$ and $q=13$ case?

Fabius Wiesner
  • 393
  • 1
  • 7
  • 1
    It might be worth describing your problem and formulation so that somebody might be able to recommend a smaller formulation. – RobPratt Nov 25 '22 at 20:57
  • The other file formats accepted by NEOS solvers actually tend to be more verbose than LP format. – prubin Nov 25 '22 at 21:04
  • You can check if MPS or nl formats produce much smaller size. Else check if you reduce variables (and so constraints) like if some combinations are not possible at all then no need to create variables/constraints around them. – Sutanu Majumdar Nov 25 '22 at 21:30
  • 1
    As an alternative to NEOS, you could try using a solver installed locally. The open-source HiGHS solver seems to perform well. Not generally as fast as CPLEX or Gurobi, but it might be sufficient for your purpose. HiGHS accepts .lp file format as an input. https://highs.dev/ – Solver Max Nov 25 '22 at 21:30
  • I have added the formulation of the problem to the question. – Fabius Wiesner Nov 25 '22 at 23:24
  • An interesting observation is that this alternative formulation where the number of elements is fixed, while the number of sets can vary, gives almost the same product between number of constraints and number of variables: $8192$ variables $\times$ $31964205$ constraints versus $91640$ variables $\times$ $2851094$ constraints here. The ratio of the two products is $\approx 1.0022$. – Fabius Wiesner Nov 26 '22 at 00:12
  • @BillyJoe, can you combine cons 3,4 & 5 into a single constraint $a_{k,l}+3z_{i,j,l} <= 4$? – Sutanu Majumdar Nov 26 '22 at 05:35
  • Similarly cons 7, 8 can be combined and $x_{k,i,j}$ eliminated. Like cons (7) can be changed to $a_{k,i} +a_{k,j} <= 1$ and con(8) as $\sum_{l=1}^k a_{k,i} +a_{k,j} >= 1$, eliminating $x_{k,i,j} $ – Sutanu Majumdar Nov 26 '22 at 05:54
  • @BillyJoe, A) If you have a valid license on some solvers like CPLEX, Gurobi, etc, you can write everything on the solver side, then let it perform in an efficient manner to reduce the problem size and solve it. B) If not, do you try writing the problem in an open-source algebraic modeling language, like Pyomo, etc, then write the problem by using their internal functions to create either LP or MPS format? AFAIK, writing an LP or MPS by yourself function needs some tricks to prevent increasing the size of the file. – A.Omidi Nov 26 '22 at 09:17
  • Your requirement that the columns be distinct motivates an alternative formulation with $2^{13}=8192$ binary variables $\lambda_c$, one for each binary column vector $c\in{0,1}^{13}$. You then have a cardinality constraint $\sum_c \lambda_c = n$, and you can also express $(2)$ in terms of $\lambda_c$. Distinctness is implicit, so you wouldn't need any additional variables or constraints for that. For requirements related to pairs of selected columns, you could generate constraints dynamically as they are violated. – RobPratt Nov 26 '22 at 15:33
  • @RobPratt is that alternative formulation this one? The one in this answer seems to work better in practice, even without the binary restriction or 0 to 1 bounds, the problem is the time grows exponentially. Regarding the dynamic constraints, can you give some references for further reading? – Fabius Wiesner Nov 26 '22 at 16:19
  • 1
    Yes, that is a similar idea but not exactly the same formulation. For more details about dynamically generated constraints, search for row generation or lazy constraints. – RobPratt Nov 26 '22 at 16:42
  • 1
    Maybe use an AMPL or GAMS script to generate the model. You can submit that to NEOS. However, it would not surprise me if NEOS would put some limits on the size of problems you are allowed to solve. – Erwin Kalvelagen Nov 29 '22 at 01:12

1 Answers1

-1

May be combine cons 3,4 & 5 into a single constraint $a_{k,l}+3z_{i,j,l}<=4$ and
Similarly cons 7, 8 can be combined and $x_{k,i,j}$ eliminated. Like cons (7) can be changed to $a_{k,i}+a_{k,j}<=1$ and con(8) as $\sum_{l=1}^k a_{k,i}+a_{k,j}>=1$

Sutanu Majumdar
  • 3,461
  • 1
  • 2
  • 11