15

I am considering the language of all satisfiable propositional logic formulae, SAT (to ensure that this has a finite alphabet, we would encode propositional letters in some suitable way [edit: the replies pointed out that the answer to the question may not be robust under varying encodings, so one needs to be more specific -- see my conclusions below]). My simple question is

Is SAT a context-free language?

My first guess was that today's (early 2017) answer should be "Nobody knows, since this relates to unresolved questions in complexity theory." However, this is not really true (see answer below), though not completely false either. Here is a short summary of things we know (starting with some obvious things).

  1. SAT is not regular (because even the syntax of propositional logic is not regular, due to matching parentheses)
  2. SAT is context-sensitive (it is not hard to give an LBA for it)
  3. SAT is NP-complete (Cook/Levin), and in particular decided by nondeterministic TMs in polynomial time.
  4. SAT can also be recognized by one-way nondeterministic stack automata (1-NSA) (see W. C. Rounds, Complexity of recognition in intermediate Level languages, Switching and Automata Theory, 1973, 145-158 http://dx.doi.org/10.1109/SWAT.1973.5)
  5. The word problem for context-free languages has its own complexity class $\textbf{CFL}$ (see https://complexityzoo.uwaterloo.ca/Complexity_Zoo:C#cfl)
  6. $\textbf{CFL}\subseteq\textbf{LOGCFL}\subseteq\textbf{AC}^{\textbf{1}}$, where $\textbf{LOGCFL}$ is the class of problems logspace reducible to $\textbf{CFL}$ (see https://complexityzoo.uwaterloo.ca/Complexity_Zoo:L#logcfl). It is known that $\textbf{NL}\subseteq\textbf{LOGCFL}$.
  7. It is not known whether $\textbf{NL}\subsetneq\textbf{NP}$ or $\textbf{NL}=\textbf{NP}$ (in fact, even $\textbf{NC}^{\textbf{1}}\subsetneq\textbf{PH}$ is open; I think I got this from S. Arora, B. Barak: Computational Complexity: A Modern Approach; Cambridge University Press 2009). Hence, there cannot be any $\textbf{NP}$-complete problem that is known to not be in $\textbf{LOGCFL}$. Hence, it must be unknown if SAT is in $\textbf{LOGCFL}$.

However, this last point still leaves the possibility that SAT is known to not be in $\textbf{CFL}$. In general, I could not find much about the relationship of $\textbf{CFL}$ to the $\textbf{NC}$ hierarchy that might help to clarify the epistemic status of my question.

Remark (after seeing some initial answers): I am not expecting the formula to be in conjunctive normal form (this will not make a difference to the essence of the answer, and usually arguments still apply since a CNF is also a formula. But the claim that the constant-number-of-variables version of the problem is regular fails, since one needs parentheses for syntax.).

Conclusion: Contrary to my complexity-theory inspired guess, one can show directly that SAT is not context-free. The situation therefore is:

  1. It is known that SAT is not context-free (in other words: SAT is not in $\textbf{CFL}$), under the assumption that one uses a "direct" encoding of formulae where propositional variables are identified by binary numbers (and some further symbols are used for operators and separators).
  2. It is not know if SAT is in $\textbf{LOGCFL}$, but "most experts think" that it is not, since this would imply $\textbf{P}=\textbf{NP}$. This also means that it is unknown if other "reasonable" encodings of SAT are context-free (assuming that we would consider logspace an acceptable encoding effort for an NP-hard problem).

Note that these two points do not imply $\textbf{CFL}\subsetneq\textbf{LOGCFL}$. This can be shown directly by showing that there are languages in $\textbf{L}$ (hence in $\textbf{LOGCFL}$) that are not context-free (e.g., $a^nb^nc^n$).

mak
  • 504
  • 2
  • 14
  • If it were, then P = NP. – Ryan Dougherty Jan 17 '17 at 15:33
  • 1
    If SAT were context free then dynamic programing (the CYK algorithm) would give a polynomial time algorithm for testing membership in SAT, giving P=NP. Even P=NP wouldn't mean that SAT were Context-Free.

    This encoding of variables seems like it might be more important than you're giving it credit for. I haven't worked out the details but if you added unary or binary "subscripts" to the variables I think you'd have trouble distinguishing (x and y) from (x and not x) for large enough indices.

    – AdamF Jan 17 '17 at 15:35
  • You have to be precise about the representation before claiming P=NP conclusions. For example, factoring a number N is polynomial-time in N (the interesting question is concerning the number of bits needed to write N in binary, or about log N). – Aryeh Jan 17 '17 at 18:17
  • I was aware of the P=NP conclusion and that the answer therefore was not expected to be "yes" (sorry for being a little provocative in how I phrased this ;-). I was still wondering if this is really known or merely something that "most experts believe" (answers now clearly indicate that the former is true; I will select one in due course). – mak Jan 17 '17 at 22:56
  • Wouldn't UNSAT be more likely to be a CFL because all UNSAT formulae have refutation proofs in proof systems (e.g. Resolution or Extended Resolution or Frege) that at least look like they could be context-free? (2) UNSAT and SAT look more like they would be context-free graph languages (CNFs can be viewed as bipartite graphs) which in general are known to be NP-complete to parse, so if they were context-free as graph grammars it wouldn't immediately imply that P=NP.
  • – botsina Nov 04 '23 at 07:18