10

Let a number $n$ be given. Consider the following language $L_n = \{ \; ww \; \vert \; w \in \{0,1\}^{n} \; \}$.

In words, $L_n$ is the set of copy strings of length $2n$.

Consider the following state complexity function $s$ such that $s(n)$ is the number of states in the smallest Pushdown Automata that recognizes $L_n$.

Question: Can you formally prove any meaningful lower bound for $s(n)$?

My Conjecture: $s(n) = 2^{\Theta(n)}$.

Known Upperbound: $s(n) \leq \mathrm{poly}(n) \cdot 2^{\frac{n}{2}}$.

Rules:

(1) The stack alphabet must be binary.

(2) The input tape is one-way and can't stop on any input character.

Michael Wehar
  • 5,127
  • 1
  • 24
  • 54
  • I currently don't have any meaningful lower bound. It seems to me you might be able to prove a lower bound for the number of variables you need for a CFG that recognize the language. Although, I'm not even totally sure of this. – Michael Wehar Jul 23 '15 at 04:36
  • 1
    My intuition is that as you push characters from the input tape to the stack, you run into a problem. If you ever want to retrieve these bits later on, you have to throw away all the bits that you pushed above it. In other words, it appears that the stack doesn't help you because the more you push to it, the more you're forced to forget later on. – Michael Wehar Jul 23 '15 at 04:39
  • 1
    Remark: For DFA's (automata without a stack), you can prove an exponential state complexity lower bound. – Michael Wehar Jul 23 '15 at 04:45
  • 1
    Can you show a reasonable lower bound for the simpler problem of ${0^k1^l0^k1^l}$? – András Salamon Jul 23 '15 at 07:53
  • I think he wants to know if the state complexity is sublinear or not. It's a similar sort of problem because you can push $0^k$ and $1^l$, but when you want to read $0^k$ on the stack you have to pop and throw away $1^l$. Thanks for sharing. :) – Michael Wehar Jul 23 '15 at 15:02
  • Now I see your point. I delete my comments. – Joseph Stack Jul 24 '15 at 15:31
  • 1
    A more precise upper bound seems to be $(n+3)2^{n/2}$ states. – András Salamon Jul 25 '15 at 12:29

1 Answers1

10

The technique described by Yuval:

Do there exists polynomial size CFG that describe this finite language?

( you may also read: Lower bounds on the size of CFGs for specific finite languages )

allows to show very easily an exponential lower bound for CFGs. Let $G$ a grammar in Chomsky Normal Form for $L_n$. For every word $w\in \{0,1\}^n$ there exists at least one non-terminal $A(w)$ accepting a subword $s(w)$ of $ww$ having length between $n/2$ and $n$. Let $p(w)$ be a position in $ww$ where this subword occurs. There are at least $n/2$ bits common to all words $w,w'$ such that $A(w)=A(w')$ and $p(w)=p(w')$. Consequently, there can be at most $2^{n/2}$ words that have the same $A(w)$ and $p(w)$. Hence there are at least $2^{\Theta(n)}$ non-terminals.

Furthermore, the PDA can be converted into a CFG in CNF, of polynomial size so this also gives the $2^{\Theta(n)}$ bound on the state complexity of $L_n$.

Joseph Stack
  • 1,095
  • 6
  • 13
  • Awesome, thanks again! I see now and will think about it to confirm. :) – Michael Wehar Jul 24 '15 at 15:38
  • Here is another example: $S \rightarrow AB, A \rightarrow 0, B \rightarrow CD, C \rightarrow 000, D \rightarrow C A, B \rightarrow EF, E \rightarrow 001, F \rightarrow C G, G \rightarrow 1$. The grammar accepts the strings 00000000 and 00010001. Also, the non-terminal $C$ derives a string of length between $\frac{n}{2}$ and $n$ and is used at the same position in deriving both words. – Michael Wehar Jul 25 '15 at 14:46
  • Chat: http://chat.stackexchange.com/rooms/26239/state-complexity-of-copy-language – Michael Wehar Jul 25 '15 at 15:15
  • 2
    Right again: changing length from $[n,2n]$ to $[n/2,n]$ introduced another issue. I had to refine the argument in a way similar to Yuval's (counting overlaps). Now I believe it is correct, at last. I edited the answer and deleted my comments – Joseph Stack Jul 28 '15 at 08:46
  • @JosephStack Awesome! Now, I believe you and I think it is good. I greatly appreciate all your help. I will delete some of my comments as well. – Michael Wehar Jul 28 '15 at 22:03
  • 1
    In case it helps anyone else: note that once $(A(w),p(w))$ has been chosen, $A(w)$ can only ever yield the single subword of $ww$ that begins at position $p(w)$. So this actually shows that any correct CNF-CFG for this language must have many nonterminals that each generate a single long subword. – András Salamon Aug 04 '15 at 10:07
  • @AndrásSalamon Hey, sorry that I still haven't responded to your email about improving the lower bound. I will soon though. Hope all is well. :) – Michael Wehar Aug 19 '15 at 14:31
  • 2
    See also Theorem 7 in my paper: http://www.cs.toronto.edu/~yuvalf/CFG-LB.pdf. – Yuval Filmus Nov 15 '15 at 23:21
  • @YuvalFilmus Thanks for the link! There was some offline discussion and we got the exact same lower bound that you have. We should have just contacted you because it seems you already knew this... – Michael Wehar Nov 16 '15 at 03:08
  • 1
    @YuvalFilmus It's worth also noting that Andras spent a bit of time trying to get the upper and lower bounds to match. My friend Pepe and I defined a general class of finite languages and applied the technique to them. We never wrote anything up though. If you ever have any related problems, we would be more than willing to collaborate. Thanks again. – Michael Wehar Nov 16 '15 at 03:13