9

Do there exists permutations $\pi_1,\pi_2$ and polynomial size (in $|w|=n$) context free grammar that describe the finite language $\{w \pi_1(w) \pi_2(w)\}$ over alphabet $\{0,1\}$?

UPDATE: For one permutation $\pi$ it is possible. $\pi$ is the reversal or relatively minor modification of the reversal.

jerr18
  • 223
  • 1
  • 9
  • 5
    Also asked on math.stackexchange. What he means is: Is there a sequence of permutations $\pi_1^n,\pi_2^n \in S_n$ such that the languages $L_n = {w \pi_1(w) \pi_2(w) : w\in {0,1}^n}$ have polynomial size CFGs? – Yuval Filmus Feb 16 '11 at 17:40
  • 2
    http://math.stackexchange.com/questions/22361/do-there-exists-permutations-pi-1-pi-2-and-polynomial-size-cfg-that-describe – Tsuyoshi Ito Feb 16 '11 at 21:56
  • 1
    Do we know if there is a CFG for $L = \bigcup_n L_n$? – Kaveh Feb 17 '11 at 00:30
  • 4
    @Kaveh: The answer is no, for any sequence of perms. If your language $L$ were context-free, then it has a pumping length $p$. Apply the pumping lemma for CFGs to the string in L associated to $w = 0^p 1^p$. The pumping lemma for CFGs also let's us say that, if the OQ has a positive answer, then the CFG for $L_n$ must use at least $\Omega(n / \log n)$ variables, since we need $3n$ to be less than the pumping length, so that our CFG for $L_n$ does not accept any strings of length $> 3n$. I do not yet see how to use this to rule out a positive answer to the OQ, but it might be possible. – Joshua Grochow Feb 17 '11 at 01:49
  • 1
    @Kaveh: (Also, if the sequence of perms can be chosen arbitrarily, then your language $L$ need not even be computable... The OQ seems to be inherently non-uniform.) – Joshua Grochow Feb 17 '11 at 01:50
  • think this is too tersely stated without details. the "update" is not clear. what is possible? what is a "reversal" of what? by reversal do you mean a permutation reversed? – vzn Jul 29 '15 at 01:47

1 Answers1

13

Chomsky normal form

A CFG is in CNF (Chomsky normal form) if the only productions are of the form $A \rightarrow a$ and $A \rightarrow BC$; a grammar can be brought to CNF with only quadratic blowup.

For a grammar $G$ in CNF, we have the nice Subword Lemma: If $G$ generates a word $w$, then for each $\ell \leq w$, there is a subword $x$ of $w$ of length $\ell/2 \leq |x| < \ell$ which is generated by some non-terminal of $G$. Proof: Descend the (binary) syntax tree, always going to the child which generates the longer subword. If you started with a subword of size at least $\ell$, you can't have gone below $\ell/2$.

Solution

Without loss of generality, we can assume that a grammar for $L_n$ (such a language with specific $\pi_1,\pi_2 \in S_n$) is in Chomsky Normal Form. The language $L_n$ consists of the words $w(x) = x\pi_1(x)\pi_2(x)$ for all $x \in \{0,1\}^n$.

Using the Subword Lemma, for each $w(x)$ we can find a substring $s(x)$ of length $$\frac{n}{2} \leq |s(x)| < n$$ generated by some symbol $A(x)$ and occurring at position $p(x)$.

Suppose that $p(x) = p(y)$ and $A(x) = A(y)$. Since $|s(x)|<n$, the subword $s(x)$ cannot intersect both the $x$ part and the $\pi_2(x)$ part of $w(x)$; we can assume it is disjoint from the $x$ part. Thus $w(x)$ is of the form $x \alpha s(x) \beta$. This implies that $A(x)$ generates exactly one string, namely $s(x)$. Therefore $s(x) = s(y)$.

Now $s(y)$ intersects either $\pi_1(y)$ or $\pi_2(y)$ in at least $n/4$ places, and thus determines at least $n/4$ bits of $y$. Therefore at most $2^{3n/4}$ strings $y \in \{0,1\}^n$ can have $p(x) = p(y)$ and $A(x) = A(y)$. Since there are at most $3n$ possibilities for $p(y)$, we get that there are at least $$\frac{2^{n/4}}{3n}$$ different non-terminals in the grammar.

Comment: The same proof works if $\pi_1,\pi_2 \in S_{\{0,1\}^n}$, i.e. are arbitrary permutations on the set of all $n$-bit words. Given $n/4$ bits of $\pi_i(y)$, there are exactly $2^{3n/4}$ preimages $y$.

More examples

Using the same method, one can prove that the language where each character appears exactly twice requires exponential-size CFG in the size of the alphabet. We can replace "twice" with any subset of $\mathbb{N}$ other than four trivial ones (ignoring $0$, either containing none of $\mathbb{N}_{\geq 1}$ or all of it).

I would appreciate a reference for this proof method.

Yuval Filmus
  • 14,445
  • 1
  • 48
  • 92
  • 2
    Yuval, could you please copy the solution here also. – Kaveh Feb 17 '11 at 03:53
  • Thank you Yuval. If your method is correct and novel I would be glad to read an article investigating more general cases with positive or negative results on polysize CFGs for finite or infinite languages. – jerr18 Feb 17 '11 at 12:49
  • 3
    There's this writeup: http://www.cs.toronto.edu/~yuvalf/cfg.pdf. – Yuval Filmus Feb 17 '11 at 13:33
  • I suppose by the exception $N_{\geq 1}$ you mean "at least one occurrence of terminal". Would this mean you can generate all permutations by intersecting with $\Sigma^{|\Sigma|}$? – jerr18 Mar 07 '11 at 14:47
  • 1
    See related question http://cstheory.stackexchange.com/q/5014 where Yuval posted an answer with a published reference. – András Salamon Jul 29 '15 at 10:34