25

I am currently interested in obtaining (or constructing) and studying 3-CNF formulae which are unsatisfiable, and are of minimum size. That is, they must consist of as few clauses (m = 8 preferably) as possible, and as few distinct variables (n = 4 or more) as possible, such that removing at least one clause will render the formula satisfiable.

More formally, any qualifying 3-CNF formula F must satisfy the following conditions:

  1. F is unsatisfiable
  2. F has a minimum amount (4+) of distinct variables (or their negation)
  3. F has a minimum amount of clauses (8+)
  4. every proper subset of F is satisfiable (allowing removal of any arbitrary clause or clauses).
  5. F has no 2 clauses that are reducible to a 2-CNF clause e.g. (i, j, k) & (i, j, ~k) is NOT allowed ( they reduce to (i,j) )

For example, with n=4, there exists many minimal 8-clause 3-CNF formulae that are unsatisfiable. For one, by looking at the 4-hypercube and trying to cover it with edges (2-faces), one can construct the following unsatisfiable formula:

1. (~A,  B,  D)
2. (~B,  C,  D)
3. ( A, ~C   D)
4. ( A, ~B, ~D)
5. ( B, ~C, ~D)
6. (~A,  C, ~D)
7. ( A,  B,  C)
8. (~A, ~B, ~C)

This qualifies as a minimum unsatisfiable 3-CNF formula because:

  1. It is unsatisfiable:

    • Clauses 1-3 are equivalent to: D or A=B=C
    • Clauses 4-6 are equivalent to: ~D or A=B=C
    • They imply A=B=C, but by clauses 7 and 8, this is a contradiction.
  2. There are only 4 distinct variables.

  3. There are only 8 clauses.
  4. Removing any clause renders it satisfiable.
  5. No 2 clauses are 'reducible' to a 2-CNF clause.

So I guess my overall questions here are, in order of importance to me:

  1. What are some other small minimum formulae which meet the above conditions? (i.e. for say, 4,5,6 variables and 8,9,10 clauses)

  2. Is there some sort of database or "atlas" of such minimum formulae?

  3. What nonrandom algorithms exist for constructing them outright, if any?

  4. What are some insights into these formulae's characteristics? Can they be counted or estimated, given n (# variables) and m (# clauses)?

Thank-you in advance for your replies. I welcome any answer or comment.

MAF
  • 259
  • 3
  • 5
  • Each 3-CNF clause disallows 1/8-th of the possible solutions. So clearly you always need at least 8 clauses, or more if the sets of disallowed solutions overlap. Since your condition 5 forbids non-overlapping sets of disallowed solutions for n=3, you need more than 8 clauses for this case: note that your example does not obey condition 5. – András Salamon Sep 07 '11 at 15:03
  • Yes, you are correct on all points András. 8 clauses are a required minimum for an unsatisfiable 3-CNF formula, and so condition 5 may be bit too restrictive for my purposes in finding/constructing qualifying formulae. I realize that for n=3, condition 5 must be necessarily violated, but was included for illustrative purposes only. I am strictly interested in qualifying formulae of size n=4+ (i.e. 4 or more variables, but not too many more.). Perhaps I will scratch condition 5. – MAF Sep 08 '11 at 00:52
  • I think that your “example” with n=3 is confusing rather than illustrative, because (as András pointed out in his comment) it is not really an example of what you are asking about in this question. The example with n=4 is perfectly fine and illustrative. Why don’t you just remove the example with n=3? – Tsuyoshi Ito Sep 08 '11 at 01:10
  • Good point, Tsuyoshi. Done. – MAF Sep 08 '11 at 01:19
  • I had overlooked the word “optional” in condition 5. “Allowed but not preferred” makes your problem ambiguous. If you prefer condition 5, I suggest that you just require it for the time being. Doing so makes the problem well-defined and I hope that the resulting problem is interesting to many people. – Tsuyoshi Ito Sep 08 '11 at 01:27
  • OK. To make things simple, I will completely remove condition 5, where I stipulated that no two 3-CNF clauses be reducible to a 2-CNF clause. After all, I am not too sure how important that condition is to what phenomena I am hoping to see in these minimal formulae. – MAF Sep 08 '11 at 03:24
  • 1
    @MAF: removing condition 5 results in a trivial example: start with the unsatisfiable instance containing clauses ${x}$ and ${x'}$, then replace each clause $C$ with two clauses $C \cup {v}$ and $C \cup {v'}$ for a fresh variable $v$, and continue until all clauses have 3 literals. This yields a 7 variable unsatisfiable formula with 8 clauses. This is just chopping up the solution space into 8 disjoint pieces, which is usually not an interesting code. – András Salamon Sep 08 '11 at 08:52
  • Unfortunately, removing condition 5 made the question much less interesting to me (although I am not sure if András’s example uses the minimum number of clauses or not). – Tsuyoshi Ito Sep 08 '11 at 16:59
  • So yes, it seems as if completely removing the 'reduction' condition (#5) results in allowing too many trivial formulae to qualify as candidates, thus ultimately defeating the spirit of the question. I will therefore reintroduce it as a requirement. I was ultimately just hoping somebody had some type of collection of these small, unsatisfiable cores. Certainly I do not think it should be too hard to generate some potential candidates using a randomized algorithm, and then run them through a SAT solver to see if they are indeed unsatisfiable. – MAF Sep 09 '11 at 05:51
  • Best thing is to get a pen and paper and write them up! :) – Tayfun Pay Sep 13 '11 at 00:03
  • @Tayfun: Yeah, I have tried that already to some extent. But recently I was hoping to do some kind of statistical analysis on a somewhat larger corpus of minimal formulae to answer some basic questions, eg. How many conflicts (pos & neg appearances of variables) tend to occur in a formula? etc. Perhaps these types of basic science questions are just that, too basic, but I was hoping somebody that had done some similar research would chime in and offer some resources. Anyhow, when this question reaches the top of my priority queue, I will definitely look into it more and do more research on it. – MAF Sep 15 '11 at 01:05
  • @MAF: I have studied the MAXIMUM side of this very extensively and I do have a CO-RP-LIKE algorithm for some variant of k-SAT. I am hoping to turn that in to a paper. However, today I realized that the Minimum Un-satisfiable formulas you defined are the formulas that cannot be solved by my algorithm and thus I get false positives. Do you have an email? I would like to collaborate. – Tayfun Pay Sep 17 '11 at 17:42
  • The core is False, then a&~a, then (a|b)&(~a|b)&(a|~b)&(~a|~b) then what you pointed out and this continues on for n-CNF. The only unsatisfiable case is when every possible full-literal clause is included. However, when non-full literal clauses e.g. 3-CNF where n>3 are present, then deciding if every clause is present is a hard problem (nothing more than an alternative perspective on deciding SATisfiability) – Gregory Morse Jan 26 '24 at 19:28

3 Answers3

14

Take the formula in your example, remove the clause $\lnot A \lor \lnot B \lor \lnot C$ and add the following $2$ clauses:

$\lnot A \lor \lnot B \lor \lnot E$
$\lnot B \lor \lnot C \lor E$

You will get a minimal unsatisfiable formula with $n=5$, $m=9$ obeying condition 5.

In general, you may randomly pick a clause $l_1 \lor l_2 \lor l_3$ and split it in $2$ clauses:

$l_1 \lor l_2 \lor v$
$l_2 \lor l_3 \lor \lnot v$

where $v$ is a new variable. Each time you do so, both $n$ and $m$ are incremented by $1$. Repeating this process allows you to "stretch" the initial unsatisfiable core, and to obtain minimal unsatisfiable formulas (obeying condition 5) whose $r=\frac{m}{n}$ tends to $1$ as $n$ grows (which is pretty rare, as formulas with $r=1$ are satisfiable with high probability).

Giorgio Camerani
  • 6,842
  • 1
  • 34
  • 64
  • Thanks for the reply, Walter. The procedure you describe is very helpful indeed for generating even more slightly larger min unsat formulae of 'similar' structure, that is, once you have a core set that you find has interesting properties. – MAF Oct 02 '11 at 04:05
  • @MAF: You are very welcome. Thank you for posting such an interesting question. – Giorgio Camerani Oct 02 '11 at 09:24
  • Notice you go from 8 full (3-literal) clauses in 3-CNF to 8-1+2=9 3-CNF clauses, and if you expand out the first 7 original formulas by adding ~E and E, you have 14 formulas. Then ~A and ~B and ~C and (E or ~E) are added along with already added terms from the newly introduced ones. So 4 variables with all 4-literal clauses, is only UNSAT if it contains all 16 clauses. This seems to be the general case, though something has me worried about pure literals, as they dont effect satisfiability. Actually pure literals have no effect, its true. – Gregory Morse Jan 26 '24 at 20:05
3

Constructing unsatisfiable 3SAT problems isn't that hard if you just build up to whatever you're looking for. For example let's start with a base requirement.

  1. An equation equal to A & !A is unsatisfiable. It's actually the only thing that's unsatisfiable. So we're trying to write a system of equations that forces this relationship.

Next we'll examine 2SAT to attempt to get to that essential 1SAT unsatisfiable.

  1. Only an equation equal to (A v B) & (A v !B) is equal to A. Of course now that we have more variables there's more ways to do this. For example, instead of the first clause, we could have (A v C) & (!C v B). By the property of resolution, this is equal to A v B. So now we can mess around with two SAT and create all sorts of variables that are unsatisfiable. Touching back on where we started, to get an unsatisfiable equation we need an ultimate equivalence of (A v B) & (A v !B) & (!A v C) & (!A v !C) Remember to stay focused on the fact that we're trying to get A & !A and you're good.

That brings us to 3SAT or coming up with equations that equate to the other equations.

  1. The tool we're going to use again should seem familiar, it's resolution again. So let's say we want (A v B). To get there we need the pattern (A v B v C) & (A v B v !C) of course now that we have three variables we can get more sophisticated. We can generate two variable clauses, new three variable clauses, or even extend into four variable clauses. It just depends on how many variables are in both clauses. If both variables other than the variable negating itself are the same, then it resolves to a 2 variable clause (such as (A v B v C) & (!A v B v C) = (B v C)). If one of them is different but the other is the same, then it resolves to a 3 variable clause (such as (A v B v C) & (!A v B v D) = (B v C v D). Finally, and most maliciously, if both variables are different it resolves to a 4 variable clause (here it's (A v B v C) & (!A v D v E) = (B v C v D v E)). So now the number of clauses it's equal to can grow. However once you're in a 4 variable equation, resolving and equating two of the variables will remove the variable you're resolving on. So a 4 variable clause in conjunction with a 3 variable clause (or another 4 variable clause) can reduce to a 3 variable clause (example: ((A v B v C v D) & (!A v B v C) = (B v C v D)). Using this techniques you can build paths away from the answer of whether it's satisfiable or not, as long as you converge back into the earlier two cases.

A quick gotcha is (A v B v C) & (!A v !B v C) equates to true. Watch out for resolving two variables simultaneously because this is B v !B v C. Oops.

1

I believe condition number 5 is not really held in your example and cannot be held ever.
Let the following clauses be equivalent:

( p, q) = (~A,B,D)(A,~B,~D)

Which will allow us to map the clauses of A, B, C, and D to new variables p, q, r, and s as the following truth table:

A B C D | p q r s
-----------------
0 0 0 0 | 0 1 0 0
0 0 0 1 | 0 1 0 1
0 0 1 0 | 0 1 1 0
0 0 1 1 | 0 1 1 1
-----------------
0 1 0 0 | 1 0 0 0
0 1 0 1 | 0 0 0 0
0 1 1 0 | 1 0 0 1
0 1 1 1 | 0 0 0 1
-----------------
1 0 0 0 | 0 0 1 0
1 0 0 1 | 1 0 1 0
1 0 1 0 | 0 0 1 1
1 0 1 1 | 1 0 1 1
-----------------
1 1 0 0 | 1 1 0 0
1 1 0 1 | 1 1 0 1
1 1 1 0 | 1 1 1 0
1 1 1 1 | 1 1 1 1
-----------------

And now we can express the clauses of A, B, C, and D in terms of p, q, r, and s:

1. (~A,  B,  D) = ( p, q,~r, s)( p, q,~r,~s)
2. (~B,  C,  D) = (~p, q, r, s)(~p,~q, r, s)
3. ( A, ~C   D) = ( p,~q,~r, s)(~p, q, r,~s)
4. ( A, ~B, ~D) = ( p, q, r, s)( p, q, r,~s)
5. ( B, ~C, ~D) = ( p,~q,~r,~s)(~p, q,~r,~s)
6. (~A,  C, ~D) = (~p, q,~r, s)(~p,~q, r,~s)
7. ( A,  B,  C) = ( p,~q, r, s)( p,~q, r,~s)
8. (~A, ~B, ~C) = (~p,~q,~r, s)(~p,~q,~r,~s)

Since all clauses are shown and associated with an A, B, C, and D clauses. Then we can claim that the p,q,r, and s clauses can be reduced to:

( p, q, r)
( p, q,~r)
( p,~q, r)
( p,~q,~r)
(~p, q, r)
(~p, q,~r)
(~p,~q, r)
(~p,~q,~r)

Which is obviously violate the condition number 5.

What I want to point out is that even the example is not explicitly shows that there are 2 clauses that can be reduced to 2-CNF, but implicitly is has(e.g. (~A,B,D) and (A,~B,~D)), you might not be able to express the 2-CNF with the given variables but when you introduce different mapping for the problem you will be able to express them.