59

What theoretical explanations are there for the practical success of SAT solvers, and can someone give a "wikipedia-style" overview and explanation tying them all together?

By analogy, the smoothed analysis (arXiv version))for the simplex algorithm does a great job explaining why it works so well in practice, despite the fact that it takes exponential time in the worst case and is NP-mighty (arXiv version).

I have heard a little bit about things like backdoors, structure of the clause graph, and phase transitions, but (1) I don't see how these all fit together to give the larger picture (if they do), and (2) I don't know whether these really explain why SAT solvers work so well, for example, on industrial instances. Also, when it comes to things like structure of the clause graph: why is it that current solvers are able to take advantage of certain clause graph structures?

I only find the results about phase transitions partially satisfying in this regard, at least in my currently limited understanding. The phase transition literature is about instances of random k-SAT, but does that really explain anything about real-world instances? I don't expect real-world instances of SAT to look like random instances; should I? Is there a reason to think phase transitions tell us something, even intuitively, about real-world instances even if they don't look like random instances?

Related questions that help, but don't completely answer my question, particularly the request for tying things together into a coherent picture:

Hermann Gruber
  • 6,470
  • 1
  • 35
  • 58
Joshua Grochow
  • 37,260
  • 4
  • 129
  • 228
  • 5
    This is not an answer but I do not think much people still believe graph structures/backdoors can explain the performances of SAT solvers. They seem more relevant to harder problems such as #SAT, QBF or knowledge compilation, where you really need to find clever factorizations as you have to explore somehow the whole structure. For your question, I am tempted to answer "nobody really knows and this is an active area of research". But I need to gather references to show what people are trying and there may be someone with a broader view than me that can give a better answer. – holf Mar 30 '17 at 22:30
  • @holf: Thanks for the comment - I already learned something! If it's an active area of research, any pointers to the latest and greatest would be great as part of an answer. I don't expect one single, final, complete answer, but am hoping to get several answers that shed some light on the situation. – Joshua Grochow Mar 30 '17 at 22:35
  • 2
    @Joshua: The simplex method is even PSPACE-mighty (Fearnley and Savani, STOC 15). – Rahul Savani Mar 30 '17 at 22:41
  • 1
    I agree with @holf. Here is my few cents: SAT solvers can ${\it usually }$ recover faster than CSP solvers because all of their variables have a domain size of two. This is not to say that no CSP solver can beat a SAT solver, it can if it has very sophisticated dynamic variable (as well as value) ordering heuristics coupled with a good encoding of the problem. Also, when you turn a "real life instance" into a SAT instance, you rarely end up at the phase-transition point because of the introduction of so many auxiliary variables. – Tayfun Pay Mar 30 '17 at 22:49
  • @TayfunPay: Okay, I should re-clarify: I don't expect that anyone knows the "one single correct" answer to this question, rather I'm hoping for answers that start to piece together a larger picture. I don't find the argument about "rarely [ending] up at the phase-transition point" particularly convincing because, as I said in the OQ, what does the phase-transition point have to do with anything if your instance wasn't random to begin with... – Joshua Grochow Mar 30 '17 at 23:11
  • 1
    Well, phase-transition point has a lot to do with it because if your instance ends up there, then you are pretty much doomed because it will probably take exponential time to solve. And what I stated in the previous comment was purely experience as well as intuition based. – Tayfun Pay Mar 31 '17 at 00:24
  • 3
    @TayfunPay: I wasn't questioning your experience - in fact, I 100% believe that "real-life" problems translate into SAT instances that aren't near the phase transition. But I don't think that explains the easiness of such instances, as those instances aren't random, so the phase transition (in theory) should have little to say about them (either hardness or easiness). – Joshua Grochow Mar 31 '17 at 01:41
  • 1
    Let me throw this out there... If the "real life instances" do not really fall in or around the phase-transition point, 1) aka the region that takes exponential time to solve most instances by any given SAT solver and 2) aka the region where if a poly-time algorithm exists for it then P=NP, then what is the point of trying to construct SAT solvers that perform better and better at this region? – Tayfun Pay Mar 31 '17 at 01:55
  • @TayfunPay: is it known that a poly time algorithm for instances with density near the phase transition implies P=NP? – Joshua Grochow Mar 31 '17 at 01:59
  • 1
    @JoshuaGrochow If you can solve ${\it all}$ ${\it of}$ ${\it the}$ instances that take exponential time to solve by most SAT solvers (there will be some hard instances that will be solved easily by some SAT solvers, but not others at the p.t.p because of the different variable ordering heuristic or the filtering algorithm or the modeling technique that they employ) in polynomial time then the instances you are left with are either over or under constrained; and plain vanilla SAT solvers can easily compute those instances in polynomial time. – Tayfun Pay Mar 31 '17 at 02:22
  • 1
    I do not think all of the users have access to the chat so I'll write here. What you are vaguely referring to is the second phase transition point. At the sptp, some instances are still hard because they get "transformed" into an instance at the ptp while the SAT or the CSP solver is attempting to solve them. And the density of the sptp depends on the variable ordering heuristic and/or the filtering algorithm. So if you can solve the instances at the ptp in poly-time then you can surely solve the instances at the sptp in poly-time. – Tayfun Pay Mar 31 '17 at 06:55
  • 1
    @TayfunPay: Interesting! The sptp seems to just push my question back a little further though: do we have any theoretical understanding that doesn't rely on the hardness of random instances that would explain why instances which eventually get transformed to those near the ptp are hard for modern solvers? If the ptp or sptp really does explain some of the success/failure of modern SAT solvers on "real-world" instances, I'd like to understand why, given that real-world instances (and the things they get transformed to in the course of an algorithm) probably aren't random-looking... – Joshua Grochow Mar 31 '17 at 15:25
  • 1
    There is this fantastic two videos using some interesting empirical methods to really dig into this question beginning with this one: https://youtu.be/ZrXipcqpuxc – Samuel Schlesinger Apr 03 '17 at 17:59
  • 3
    This has already been mentioned elsewhere, but it is important to note that clause-variable density and phase transitions are relevant only for random k-SAT and has nothing to do which the hardness of instances coming from industrial or combinatorial problems. Thus, most of the discussion above is beside the point. Also, it is worth noting that regarding random k-SAT there is no real easy-hard-easy pattern --- for the proof systems where we have lower bounds for randomly generated k-CNF formulas, formulas remain exponentially hard for arbitrarily large constant densities above the threshold. – Jakob Nordstrom Jul 14 '17 at 18:23

6 Answers6

27

I am assuming that you are referring to CDCL SAT solvers on benchmark data sets like those used in the SAT Competition. These programs are based on many heuristics and lots of optimization. There were some very good introductions to how they work at Theoretical Foundations of Applied SAT Solving workshop at Banff in 2014 (videos).

These algorithms are based on DPLL backtracking algorithm which tries to find a satisfying assignment by setting values to variables and backtracks when it finds a conflict. People have looked at how much impact these heuristics have. E.g. see

It seems that the efficiency of these SAT solvers on the benchmarks comes mainly from two two heuristics (and their variants):

  1. VSIDS heuristic fir selecting which variable to branch on next.
  2. CDCL: conflict-driven clause learning heuristic which learns a new clause from a conflict.

It is well-known that DPLL proofs correspond to proofs in resolution. Without CDCL the only resolution proofs we can get are tree resolution proofs which are much weaker than general resolution proofs.

There are results that show with CDCL we can get any general resolution proof. However there are caveats, they need many artificial restarts, artificial branching, and/or particular preprocessing, so it is not clear how close these are to what these programs do in practice. See e.g. the following paper for more details:


CDCL is essentially cutting branches from search space. There are various ways of deriving a new learnt clause from a conflict. Ideally we would add a set of minimal clauses which implied the conflict, but in practice that can be large and can be expensive to compute. Top SAT solvers often delete learned clauses regularly and that helps in practice.


The other heuristic VSIDS essentially maintains a score for each variable. Every time there is a conflict, all scores are adjusted by multiplying them with a value $\alpha$<1 and adding a constant to those which were "involved" in the conflict. To see what this means think about the sequence $F(v,i)$ which is 1 if variable v was "involved" in the $i$th conflict. Let $0<\alpha<1$ be a fixed constant. The score of variable $v$ at time $n$ is then: $$\sum_{i<n} F(v,i)\alpha^{(n-i)}$$

Intuitively one can say that this tries to emphasize variables which were consistently involved in recent conflicts. You can also think of it as a simplistic but extremely cheap way of predicting which variables will be involved in the next conflict. So VSIDS branches first on those variables. One can claim that the algorithm is essentially a fail-fast algorithm, find conflicts fast. Fast is related to smaller number of variables set, which means blocking large subtrees of the search tree. But this is mostly intuition, afaik no one has formalized it very careful to test it on SAT data sets. Running a SAT solver on one of these data sets is not cheap, let alone comparing it with the optimal decisions (smallest extension of the current assignment to variables which would violate one of the clauses). VSIDS also depends on which variables we bump as each conflict, there are various ways to define when a variable is involved in a conflict.


There are results that show particular implementation of these ideas correspond to a time-weighted centrality of vertices in dynamic graphs.

There are also suggestions that excluding adversarial instances like those based on NP-hard problems and crypto primitives and random instances (which CDCL SAT solvers are not good at) the rest of instances come from very well structured things like software and hardware verification and somehow these structures are exploited by CDCL SAT solvers (lots of ideas have been mentioned like backdoors, frozen variables, etc.) but afaik they are mostly ideas and do not strong theoretical or experimental evidence to back them up. I think one would have to fist rigorously define the properly and show that the instances on which these algorithms work well have the property and then show that these algorithms exploit those properties. What happens in practice is that people are mostly interested in faster algorithms and if someone comes up with an algorithms that is somehow got inspired by one of these ideas and that algorithms beats other algorithms they go with the claim.


Some people keep insisting that the clause ratio and thresholds are the only game in the town. That is definitely false as anyone who is slightly familiar with how industrial SAT solvers work or has any knowledge of proof complexity would know. There are lots of things that make a SAT solver work well or not on an instance in practice and the clause ratio is just one of the things that might be involved. I think the following survey is a good starting point to learn about the connections between proof complexity and SAT solvers and perspective:

Interestingly even the threshold phenomenon is more complicated than most people think, Moshe Vardi stated in his talk "Phase transitions and computational complexity" that the median running time of GRASP remains exponential for random 3SAT formulas after the threshold but that the exponent decreases (afaik, it is not clear how fast it decreases).


Why are we studying SAT solvers (as complexity theorists)? I think the answer is the same as for other algorithms: 1. compare them, 2. find their limitations, 3. design better ones, 4. answer fundamental questions of complexity theory.

When modelling a heuristic we often replace heuristic with nondeterminism. The question then becomes is it a "fair" replacement? And here by fair I mean how close is the model in helping us answer the question above.

When we model a SAT solver as a proof system we are partly showing it limitation because the algorithm will be inefficient for statements which have lower bounds in the proof system. But there still is a gap between what the algorithm actually finds and the optimal proof in the proof system. So we need to show that the reverse as well, i.e. the algorithm can find proofs that are as good as those in the proof system. We are not close to answer that question, but the amount of heuristic that is replace by non-determinism defines how close the model is to the proof system. I don't expect that we completely drop the replacement of heuristics with nondeterminism, otherwise we would get automatizability results which have consequences on open problems in crypto, etc. But the more nondeterminism the model has the less convincing it is in explaining the behaviour of the SAT solver.

So the question when looking at a model becomes: how much does the models help explain why SAT solver A is better than SAT solver B? How helpful are they in developing better SAT solvers? Does SAT solver find proofs in practice that are close to optimal proofs in the model? ... We also need to model the practical instances.

Regarding the intuition that CDCL SAT solvers "exploits the structure of practical instances" (whatever that structure is) is generally accepted intuition I think. The real question is to give a convincing explanation of what that means and demonstrate that it is indeed true.

See also Jakob Nordstrom's own answer for more recent developments.

Kaveh
  • 21,577
  • 8
  • 82
  • 183
  • 1
    Does this hold for SAT instances derived from integer factoring? – Mohammad Al-Turkistany Mar 31 '17 at 11:20
  • Analysis of VSIDS here. – Martin Berger Mar 31 '17 at 12:29
  • @Martin, yep, I know about that paper, I was involved in it (you would be amazed how many times we got rejected by reviewers saying this is obvious). It is more or less what I have written here and what we presented at Theoretical Foundations of SAT Solving workshop at Banff. – Kaveh Mar 31 '17 at 13:06
  • 1
    Yes, but it is still a ${\it heuristic}$. It might very well do worse on some instances that can be easily solved by a different heuristic using the same SAT solver. E.g., look at their subsequent paper. They show that on average their new CHB heuristic performs better than VSIDS. However, if you look carefully at Table 1, you'll see that on certain set of instances CHB actually performed worse on average... Therefore, I do not believe that talking about ${\it heuristics}$ is a healthy way to go about answering this question. – Tayfun Pay Mar 31 '17 at 13:58
  • @Kaveh I'm painfully aware of the "it's obvious / well-known" rejection ... Do you know what the proof-complexity state-of-the-art is w.r.t. CDCL? I know that they well-known examples where DPLL performs badly. – Martin Berger Mar 31 '17 at 16:52
  • 1
    @Martin, without CDCL we get only very restricted forms of resolution (I don't recall exactly what off my head). The are results by Paul Beame and others which show that with CDCL and restarts you can essentially get any general resolution proof (however the selection of restart points and branches are kind of artificial), see Beame's paper for details. – Kaveh Mar 31 '17 at 20:06
  • 3
    @Martin, see also Jakob Nordstrom's survey: http://www.csc.kth.se/~jakobn/project-proofcplx/docs/InterplayProofCplxSAT.pdf – Kaveh Mar 31 '17 at 20:19
  • @Mohammad, (not sure what you are referring to but) generally these algorithms do not work that well on adversarial instances like those from crypto. The website of SAT solving competition has lots of data on how various SAT solvers perform on various categories of of instances: http://www.satcompetition.org – Kaveh Apr 01 '17 at 04:44
  • @Kaveh, OK, so I will stick my neck out. Regarding the paper https://arxiv.org/abs/1506.08905 I agree that the abstract sounds truly great, and that these are the kind of connections one would hope to find. But I do not agree that the paper proper actually backs up the claims made in the abstract. As to conference submissions and rejections, this is of course a favourite topic, but I would perhaps add the comment here that sometimes it is amazing how many times authors fail to take concrete and relevant reviewer comments and suggestions into consideration before resubmitting... – Jakob Nordstrom Jul 14 '17 at 08:00
  • 1
    Regarding proof complexity and CDCL, it was shown by Pipatsrisawat and Darwiche http://www.sciencedirect.com/science/article/pii/S0004370210001669 and independently by Atserias, Fichte and Thurley http://jair.org/papers/paper3152.html that CDCL viewed as a proof system can polynomially simulate resolution (the papers state different results, but the proofs in both papers can be used to obtain the results in the other paper). An important difference from previous papers in this line of research is that there is nothing articificial in these models of CDCL. [To be continued...] – Jakob Nordstrom Jul 14 '17 at 08:13
  • 1
    To be concrete: unit propagation is not ignored, no decisions are made past conflicts, no articifial clause learning scheme is used, and there is no unnatural preprocessing step. Also, contrary to what many people seem to bellieve, the restart policy is not unnaturally frequent or unnatural in any other way --- the results would hold if you plug in any standard restart policy.) [To be continued...] – Jakob Nordstrom Jul 14 '17 at 08:14
  • 1
    From a practical point of view, one weakness of [PD11] and [AFT11] is that the polynomial blow-up in the simulation is pretty bad -- by a power of 4 or 5 or so, which means that it does not have too much explanatory power from an applied point of view, where you are mostly looking for linear running times (and hence proof sizes). More seriously, the simulation theorem critically relies on that the CDCL solver never, ever deletes a single learned clause. This is quite unrealistic, since solvers will typically erase 90-95% of learned clauses. But the papers are still extremely nice. – Jakob Nordstrom Jul 14 '17 at 08:22
  • My understanding from Paul's paper regarding restarts was that the process ones essentially restarts after learning one formula in the proof which I find very artificial. :) – Kaveh Jul 14 '17 at 12:40
  • @Kaveh I don't know Beame's papers that well by heart. Regarding [PD11] and [AFT11] it is true that the proofs are written so that the solvers restart after every conflict, but (as noted in the papers) this is just for convenience. The only condition needed is that restarts are frequent enough, and a standard policy such as, e.g., Luby restarts satisfies this "frequent enough" criterion. In fact, solvers such as Lingeling or Glucose tend to restart more often than Luby, as far as I understand, since you don't want the long stretches of conflicts in between restarts that Luby gives you. – Jakob Nordstrom Jul 14 '17 at 17:07
  • @Kaveh Also, it is worth noting that restarting after every conflict is not that artificial. From what I have understood, most often (though not always) proof search quality goes up with restart frequency, pretty much all the way up to restarts after every conflict. (I don't have a formal reference for this, though --- applied SAT experts like, e.g., Armin Biere, Marijn Heule, or Laurent Simon could elaborate or correct.) However, restarts don't come for free, so even if the proof gets better you lose in running time if you restart all too often. But restarts are very frequent in practice. – Jakob Nordstrom Jul 14 '17 at 18:31
  • I kind of disagree, e.g. if you restart too quickly you will not learn anything useful. Let me express the issue in another way: We want to see that: 1. proofs of the solver correspond to proofs in the proof system. 2. proofs that the solver finds in practice are close to the (optimal) proofs in the proof system. 1 is usually uncontroversial, 2 is where the concern comes from. If our model of the solver has too much replacement of heuristic with nondeterministic decisions, then of course some decisions will lead to that optimal proof, but would an actual run of the solver find such a proof? – Kaveh Jul 14 '17 at 23:45
  • @Kaveh: I have never seen any experimental data supporting the general conclusion that the quality of learned clauses goes down if you restart too often. I would be very interested if you have references (even anecdotal) for this. You are absolutely right, of course, that a weak aspect of the simulation results in [PD11] & [AFT11] is that all variable decisions are made by an oracle. But this would seem to be inherent. A constructive decision heuristic would automatize resolution, make the parameterized complexity hierarchy crash, and hence be a major, major breakthrough in complexity theory. – Jakob Nordstrom Jul 16 '17 at 07:07
  • @Kaveh: Let me try to clarify the points I was trying to make: (1) [PD11] & [AFT11] do not need artificial restarts, learning or preprocessing. The branching is part of the proof system but again anything else would be a major breakthrough. (2) Even though [PD11] & [AFT11] results don't need it, you could hack your favourite solver (say, MiniSat, Glucose, or Lingeling) to restart after every conflict, and the performance wouldn't be all that bad AFAIK (though I haven't made systematic experiments). But it is better in practice to restart a bit less frequently to improve running time. – Jakob Nordstrom Jul 16 '17 at 07:12
  • Thanks Jacob. I understand your assessment that the restarts in those models are not really artificial. For restarts, in my comment I was referring to extreme case of restarts which you might not even reach a conflict. Re nondeterminism, it doesn't need to be fully deterministic, we can have nondeterminism in our model of the solver, the question is that the more replacement of heuristic with nondeterminism we have the less believable the model becomes. – Kaveh Jul 16 '17 at 13:29
  • And it is not just branch selection where a heuristic is employed, what clause to learn from a conflict, what clauses to delete, ... are also important heuristics in the state of the art solvers. – Kaveh Jul 16 '17 at 13:29
  • @Kaveh: Don't think I know of any paper where the solver restarts without reaching a conflict first. Doesn't make too much sense to me, since the behaviour after restart will be exactly the same as if this restart never happened. The learning scheme is definitely also very important, as you say, but [PD11] & [AFT11] hold for any adversarially chosen scheme as long as it is assertive (a completely standard assumption). What clauses to delete is a great question, but the models we are discussing here don't ever do erasures (I don't know any other paper than our SAT '16 paper considering this). – Jakob Nordstrom Jul 16 '17 at 18:46
  • @Jakob, yes, but in practice faster restart can make sense, specially if you don't have nondeterminism to decide which branch to explore first. Re clause learning heuristic, are you saying that all state-of-art SAT solvers are essentially equivalent? If so that would be an interesting result, however it would show that the models are not good enough to capture what people in practice care about. – Kaveh Jul 17 '17 at 00:23
  • 1
    @Kaveh: Let me be a bit more formal. The reason restarting before reaching even one conflict doesn't make sense is that then your VSIDS scores and phases haven't changed, and so after restarting you pick exactly the same variables and set them in exactly the same way, exploring the same branch. Regarding clause learning I am saying that [AFT11] & [PD11] capture any reasonable learning scheme people would contemplate using. But in practice it is true that pretty much all CDCL solvers learn the 1UIP clause, typically with some clause minimization. The learning scheme part is very standardized. – Jakob Nordstrom Jul 17 '17 at 10:42
  • I just realized that one point perhaps worth making is that one should think of state-of-the-art solvers as essentially deterministic. Thus, it is not the case that restarts will make solvers go to some other, randomly chosen, part of the search space. In fact, the currently most popular heuristics are such that the solver will go to the same place again. But this discussion is getting a bit on the long side --- apologies for this. I should probably try to fold the above into an edited version of my answer at some point... – Jakob Nordstrom Jul 17 '17 at 12:11
24

I am typing this quite quickly due to severe time constraints (and didn't even get to responding earlier for the same reason), but I thought I would try to at least chip in with my two cents.

I think this is a truly great question, and have spent a nontrivial amount of time over the last few years looking into this. (Full disclosure: I have received a big part of my current funding precisely in order to try to find answers to questions of this type, and then potentially to convert deeper insights into SAT into more efficient SAT solvers.)

If one would have to give a one-sentence answer, then I think

nobody really knows and this is an active area of research

is pretty much as good as it gets. Except that that there is a lot more room for more activity, especially from the theory side.

Some proposed explanations (not mutually exclusive), which have already been discussed in other answers and comments, are

  • (a) backdoors,
  • (b) parameterized complexity considerations,
  • (c) graph structure of the CNF problem,
  • (d) proof complexity considerations, and
  • (e) phase transitions.

Starting from the end (e), there seems to be quite some confusion regarding phase transitions. The short answer here is that there is no reason whatsoever to believe that the ratio of clauses to variables is relevant for applied problems or theoretical combinatorial problems (a.k.a. crafted instances). But for some reason it is a not too uncommon misconception in the applied part of the SAT community that the clauses-to-variables ratio should somehow be a generally relevant measure. The clause-to-variable ratio is very relevant for random k-SAT, but not for other models.

My feeling is that backdoors (a) have been a popular explanation, but I personally have not really seen convincing evidence that it explains what goes on in practice.

Parameterized complexity (b) provides a beautiful theory about some aspects of SAT, and a very attractive hypothesis is that SAT instances are easy because they tend to be "close to some island of tractability." I think this hypothesis opens up many exciting directions of research. As noted in some of the answers, there are a lot of connections between (a) and (b). However, so far I don't really see any evidence that parameterized complexity correlates too much with what is going on in practice. In particular, it seems that instances that are tractable can be very, very hard in practice, and instances without any small backdoors can still be very easy.

The explanation that seems most believable to me for industrial instances is (c), namely that the (graph) structure of the CNF formulas in question should be correlated with practical SAT performance. The idea here is that variable and clauses of industrial instances can be clustered into well-connected communities with few connections between, and that SAT solvers somehow exploit this structure. Unfortunately, it seems quite hard to pin this down more rigourously, and equally unfortunately this area suffers from a fair amount of hype. The proposed explanations I have seen so far in papers are quite unsatisfactory and the models seem to be easy to shoot down. The problem would seem to be that if one really wants to do this thoroughly, then the math gets really hard (because it is a hard problem) and it also gets extremely messy (because you need your model to be close enough to reality to get relevant results). In particular, the papers I have seen explaining that the performance of the VSIDS (variable state independent decaying sum) heuristic for variable choices works well because it explores communities in the graph representation of the instances are quite unconvincing, although the hypothesis as such is still very attractive.

One line of research that I have personally pursued is whether practical SAT performance somehow correlates with proof complexity measures of the CNF formulas in question. Unfortunately, the short answer seems to be that there really is no clear and compelling connection. It might still be that there are nontrivial correlations (this is something that we are currently investigating in different ways), but it seems theory is too nice and clean and pretty and reality is too messy for there to be a really good match. (Regarding the paper Relating Proof Complexity Measures and Practical Hardness of SAT by Järvisalo, Matsliah, Nordström, and Živný in CP '12 it has turned out that more detailed experiments provide a much more complex picture with less clear conclusions --- we hope to get to a full journal version reporting on this any decade now, but it is complicated, though still hopefully interesting.)

Another, related, line of work in proof complexity is to model state-of-the-art SAT solvers as proof systems and prove theorems in these models to deduce properties of the corresponding solvers. This is a bit of a minefield, though, in that small and seemingly innocuous design choices on the theoretical model side can lead to the results being pretty much completely irrelevant from a practical point of view. On the other hand, if one wants a theoretical model that is close enough to reality to give relevant results, then this model gets extremely messy. (This is because SAT solver performance depends on the global history of everything that has happened so far in nontrivial ways, and this means the model cannot be modular in the way we usually set up our proof systems --- whether a particular derivation step is "correct" at a certain point in time cannot be locally verified, but depends on the whole proof so far.)

Two papers that really should be mentioned as exceptions to this, though, are [Pipatsrisawat and Darwiche 2011] and [Atserias, Fichte and Thurley 2011], where it is shown that conflict-driven clause learning SAT solvers modelled in a natural way have the potential to polynomially simulate full, general resolution. There are a fairly long list of papers preceding [PD11] and [AFT11] that essentially claim the same result, but they all have serious issues with the modelling. (It is true that [PD11] and [AFT11] also need some assumptions to work, but they are pretty much the minimal ones you would expect unless you are asking for papers that would also show that the parameterized complexity hierarchy collapses.)

Again, I am writing all of this very quickly, but if there is substantial interest for anything of the above I could try to elaborate (although it might take a while to return to this again --- please feel free to ping me if there is anything you would want me to comment on). As a quick way of providing references, let me do some shameless self-plugs (although the shame is decreased somewhat when I see that some comments have also cited some of these references):

Tutorial-style talk On the Interplay Between Proof Complexity and SAT Solving given at the International Summer School on Satisfiability, Satisfiability Modulo Theories, and Automated Reasoning in 2016 with lots of full references at the end of the slides: http://www.csc.kth.se/~jakobn/research/TalkInterplaySummerSchool2016.pdf

Slightly more recent, and briefer, survey talk Understanding Conflict-Driven SAT Solving Through the Lens of Proof Complexity from early 2017 (also with full references at the end): http://www.csc.kth.se/~jakobn/research/TalkProofComplexityLensCDCL1702.pdf

Survey of connections between proof complexity and SAT solving: http://www.csc.kth.se/~jakobn/research/InterplayProofCplxSAT.pdf [Bibliographic reference: Jakob Nordström. On the Interplay Between Proof Complexity and SAT Solving. ACM SIGLOG News, volume 2, number 3, pages 19-44, July 2015. (Lightly edited version with some typos fixed.)]

SAT '16 paper with CDCL modelled faithfully as a proof system: http://www.csc.kth.se/~jakobn/research/Trade-offsTimeMemoryModelCDCL_SAT.pdf [Bibliographic reference: Jan Elffers, Jan Johannsen, Massimo Lauria, Thomas Magnard, Jakob Nordström, and Marc Vinyals. Trade-offs Between Time and Memory in a Tighter Model of CDCL SAT Solvers. In Proceedings of the 19th International Conference on Theory and Applications of Satisfiability Testing (SAT '16), Lecture Notes in Computer Science, volume 9710, pages 160-176, July 2016.]

Jakob Nordstrom
  • 554
  • 5
  • 6
19

I'm not an expert in this area, but I think the random SAT / phase transition stuff is more or less completely unrelated to the industrial/practical applications stuff.

E.g., the very good solvers for random instances (such as https://www.gableske.net/dimetheus) are based on the statistical physics methods (belief propagation etc.) I believe, whereas the very good 'general' solvers (such as http://fmv.jku.at/lingeling/) are using unrelated techniques (more like what Kaveh was talking about) I believe.

But, as I said, maybe don't take my word for it; this is coming from a definite non-expert.

Ryan O'Donnell
  • 1,811
  • 17
  • 21
  • 2
    Yes. Random SAT and industrial SAT are completely different games, and the methods used are different. And, in addition, if you want to solve really hard combinatorial instances, then yet other techniques are more successful (for instance, if the problem is hard enough, then doing clause learning does not really pay off, except perhaps very locally).

    But somehow it seems like a fairly common misconception (at least on the applied side of the SAT community) that the ratio of clauses to variables should somehow be relevant for any CNF SAT instance and not just for random instances.

    – Jakob Nordstrom Jul 12 '17 at 12:24
18

Let me add my two cents of understanding to this, even though I've never actually worked in the area.

You're asking one of two questions, "what are all the known approaches to proving theoretical efficiency of some SAT solver for some type of instances" and "why are SAT solvers efficient in reality".

For the former question, I'll just direct you to the research of Stefan Szeider. He seems to me to be the currently most active area in the topic of backdoors and FPT parameterizations of SAT (which includes both structural approaches such as treewidth-type measures and so-called backdoor sets, as well as some mixture of the two).

For the latter question, to be honest, that exact question has been debated at every SAT-solving workshop I've attended (including recent years), so it's not a settled matter. But my impression is as follows.

First of all, you need to quantify or restrict what you mean by "in reality". It's not true that SAT solvers are universally good solvers for any problem you throw them at (of course), and between different problems you end up needing different strategies. For example, there are several recent successes where mathematical conjectures have been verified or tightened by an enormous computer search aided by a SAT solvers. In this case, apparently, a lot of the time the clever CDCL-style improvements and heuristics that modern SAT solvers normally employ don't really buy you too much power, and the game boils down to a clever way of splitting your source problem into parts, followed by (essentially) a brute-force branching algorithm with a small constant factor in the running time.

I may be overselling this point slightly, but my point was, when SAT solvers attacked for example the Erdos Discrepancy Conjecture, they were successful for a different reason than when they solve industrial instances derived from circuit testing.

So we're down to asking, why do CDCL-based solvers work so well on industrial instances of, for example, circuit verification (circuit equivalence testing)? I think that the most powerful view (or the consensus view) here is the search strategy of a CDCL solvers gels very well with the structure of these instances. This means, for example, that the circuits consist of relatively complex parts (call them clusters), interconnected by relatively speaking fewer and simpler connections, possibly in a hierarchical manner; and that the CDCL-solvers with all their heuristics are very good at (de facto) exploiting this and "projecting" these clusters onto the shared variables, in whatever way or order is most useful for verifying the circuit at hand. But it also seems to still be the consensus view that this is an insufficient explanations (e.g., we probably cannot theoretically explain the efficiency of CDCL SAT solvers in this domain purely by referring to the graph structure of the instance). So as of yet, I don't think we have a full answer.

So do the tractable parameterizations go some way towards explaining the latter? To be honest, I don't know. I think there is probably a powerful effect in play that the real-world instances are not worst case, nor are they (probably) truly average-case according to any instance distribution we are able to handle. But it's probably still worth asking.

  • 2
    Magnus, you definitely have more experience in this area than anybody else that attempted to answer this question thus far. When I stated "my two cents", I was referring to the fact that I have only studied one specific group of NP-Complete problems in my dissertation and how CSP and SAT solvers try to resolve the numerous instances of these problems. I also have roughly a year of experience using CSP and SAT solvers in the work place, but once again, that is not even close to your 10+ years of experience in this field. Your "two cents" are probably worth "two golden nuggets". One question. – Tayfun Pay Apr 02 '17 at 16:38
  • 1
    You state the following in your answer " It's not true that SAT solvers are universally good solvers for any problem you throw them at....". Were you able to look at the ratio of clauses to variables, c=m/n, for these instances? In other words, were they at the hard region, c=~4.2 or not? Because what I have experienced is that you end up with a lot of variables when you reduce a CSP instance to a SAT instance, and it is usually due to that reason and not because it is actually at the hard region, the SAT solver takes longer time to resolve. – Tayfun Pay Apr 02 '17 at 16:40
  • 1
    On the other hand, if you know that these instances ended up at the hard region of SAT, c=~4.2, then may I know what was this real-life problem? I would be very interested in knowing what real-life problems actually end up at the hard region of SAT when they are reduced to it. Thank you – Tayfun Pay Apr 02 '17 at 16:41
  • 2
    Honestly, I have little to no experience of practical SAT solving. All my actual work has been on the pure theory side of that question. But be that as it may, regarding your question: My impression is that the results for random k-SAT and clause density (which you mention) really only apply if your input instance is literally a uniformly random formula.

    Also note the the bound ~4.2 only applies if the formula is 3-SAT, as opposed to a mixed-length CNF formula.

    – Magnus Wahlström Apr 05 '17 at 15:39
  • 1
    And by "problems not well solved by SAT solvers," I mostly mean problems that are presumed to be genuinely intractable, such as breaking a good crypto.

    However, there are also formulas that no CDCL solver will be able to solve efficiently due to proof-theoretic lower bounds, such as pigeon-hole principle formulas. I have seen at least one talk with the message "CDCL SAT solvers failed me", where a bit of digging reveals that their problem encoding hides a pigeon-hole-like aspect (e.g., containing some variation of the assignment problem). Unfortunately, I can't recall the details.

    – Magnus Wahlström Apr 05 '17 at 15:48
  • 1
    Perhaps you can help me better understand this. Maybe I am confused. If you fix the number of variables to say $n=10$, we can have up to $8{10 \choose 3}$ many different clauses. Then enumerate all possible instances of each count of clauses, $m$ going from 1 through $8{10 \choose 3}$. Then input all of these instances into a SAT solver, say a plain vanilla DPLL, and count the number of back-tracks for each instance at each $m$. Then if you look at the mean, better yet median, number of back-tracks at each $m$, wouldn't be the case that the peak will be around $m=42$? – Tayfun Pay Apr 06 '17 at 15:13
  • 1
    We are just randomly picking instances at each $m$ from all possible instances. If most of the instances at around density 4.2 were not hard, then everybody would have needed to have gotten so lucky to be able to always select hard instances at around density 4.2. Therefore, why isn't it reasonable to look at the density of the real life instances to see if they fall around density 4.2? – Tayfun Pay Apr 06 '17 at 15:23
17

There is a paper "Relating Proof Complexity Measures and Practical Hardness of SAT" by Matti Järvisalo, Arie Matsliah, Jakob Nordström, and Stanislav Živný in CP '12 that attempts to link the hardness or easiness of solving certain formulas by CDCL solvers with resolution proof complexity measures, in particular resolution proof space. The results are somewhat mixed, but it is a step in the right direction I think.

Jan Johannsen
  • 4,630
  • 2
  • 33
  • 50
  • 3
    Unfortunately, the short answer seems to be that there really is no clear and compelling connection to proof complexity measures. It might still be that there are nontrivial correlations, but it seems that theory is too clean and reality is too messy for there to be a really good match.

    Regarding the "Relating Proof Complexity Measures" paper it has turned out that more detailed experiments provide a much more complex picture with less clear conclusions --- we hope to get to a full journal version reporting on this any decade now, but it is complicated, though still hopefully interesting.

    – Jakob Nordstrom Jul 12 '17 at 12:41
2

Since the original question title asks for theoretical explanations, let me point to a paper that might provide a kind of non-standard theoretical explanation.

The main result of the paper is that one can design a polynomial time errorless heuristic with exponentially small failure rate for all paddable languages (including SAT), if the errors are counted by a modified accounting scheme. This scheme is somewhat different from the usual one (which is the rate of errors among $n$-bit strings), but it is still in a close relationship with it.

See the details in the paper (sorry for self-promotion, I am one of the authors):

"A new algorithm design technique for hard problems," Theoretical Computer Science, Vol. 821, June 2020, pp. 45-56

If it is behind a pay wall, then here is a free version.

Inuyasha Yagami
  • 1,531
  • 1
  • 8
  • 25
Andras Farago
  • 11,396
  • 37
  • 70
  • How being able to redistribute the failing inputs towards higher-order inputs and making them visible much later may shed any light? Isn't it like hiding dust under the carpet? For that matter, we might as well push away the right answers and have the failures dominate. For example let's say I have an algorithm $A$ that fails on half its binary inputs in the usual counting scheme. Say it admits to fail on any string starting with $0$ and correctly answers on any string starting with $1$. I might insert some bijective polynomial-time transform between the inputs and $A$ so that now ... – Claude Chaunier Jul 18 '21 at 09:56
  • ... the $2^n$ strings of length $n$ are transformed into one string starting with $0$ and $2^n-1$ strings starting with $1$, with no repeats nor holes as $n$ goes through $\mathbb{N}$. Now $A$ will only admit to fail once over $2^n$ inputs. Or I might do the opposite and have only one string starting with $1$ over the $2^n$ inputs of length $n$. Now $A$ would have an exponentially low success-rate under your transformed scheme. In any case that's more like obfuscating our knowledge about $A$ than shedding any light on it, don't you think? – Claude Chaunier Jul 18 '21 at 09:56
  • @ClaudeChaunier Thank you for our comment, it helps clarifying the method. As I see it, the problem with your argument is this: the algorithm does not just re-distribute the unanswered inputs (when it says "don't know"). It also guarantees that whenever an answer is provided, it must be correct. Without this, we could indeed just push the unanswered inputs "out of sight" and keeping the answered inputs "near in sight." But how do we know the correct answer to all of the answered inputs? Doing this correctly is the essence and the nontrivial part of the method. – Andras Farago Jul 18 '21 at 15:55