29

Fix an integer $n$ and alphabet $\Sigma=\{0,1\}$. Define $DFA(n)$ to be the collection of all finite-state automata on $n$ states with starting state 1. We are considering all DFAs (not just connected, minimal, or non-degenerate ones); thus, $|DFA(n)| = n^{2n}2^n$.

Now consider two strings $x,y\in\Sigma^*$ and define $K(x,y)$ to be the number of elements of $DFA(n)$ that accept both $x$ and $y$.

Question: What is the complexity of computing $K(x,y)$?

This question has implications for machine learning.

Edit: Now that there's a bounty on this question, I suppose a bit more precision in the formulation is in order. For $n\ge1$, let $DFA(n)$ be the collection of $n^{2n}2^n$ automata, as defined above. For $x,y\in\{0,1\}^*$, define $K_n(x,y)$ to be the number of automata in $DFA(n)$ that accept both $x$ and $y$. Question: can $K_n(x,y)$ be computed in time $poly(n,|x|,|y|)$?

Aryeh
  • 10,494
  • 1
  • 27
  • 51
  • Are the counted automata supposed to accept only $x$ and $y$? – Raphael Jul 26 '11 at 18:02
  • 2
    If you fix a DFA without fixing the final states, then either it maps x and y to the same state, in which case the only constraint is that the state has to be final, or it maps them to two different states, in which case the only constraint is that they both have to be final. Thus, I would reword your problem as "how many DFAs map x and y to different states?". – a3nm Jul 26 '11 at 18:20
  • 3
    Aryeh, can you explain the count $n^{2n} 2^n$? I cannot get the $2^n$ factor. Added: Oops, I forgot to specify the final states. Anyway, for the sake of others, here's how the count goes. For each state, specify where to go on inputs $0$ and $1$; that accounts for $n^{2n}$. Specify the set of final states; that's $2^n$. – Srivatsan Narayanan Jul 26 '11 at 20:12
  • A good place to find such answers: http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/s/Shallit:Jeffrey.html – Vijay D Jul 26 '11 at 20:33
  • @Raphael, no. I'm pretty sure Aryeh wants both $x$ and $y$ accepted, and he doesn't care about what happens to the other strings. – Lev Reyzin Jul 26 '11 at 21:01
  • 2
    Indeed, I don't care what happens to strings other than $x$ and $y$. I guess one needs a certain amount of points to start a bounty? – Aryeh Jul 27 '11 at 11:08
  • @Aryeh: No one can start a bounty on this question yet. The question has to be at least 48 hours old. More here: http://cstheory.stackexchange.com/privileges/set-bounties – Aaron Sterling Jul 27 '11 at 14:00
  • A very naive answer is that this problem is in #P , since computing if a given $DFA$ accepts both $x$ and $y$ is in P (actually it is in $O(n)$). From looking a bit at this problem, I would suggest that the difficulty arises mostly from checking how $x$ and $y$ interact, i.e. how can you alter the DFA while keeping the invariant of accepting both strings. – chazisop Jul 27 '11 at 15:38
  • What size does the smallest automaton $A^$ that accepts $x$ and $y$ (amongst others) have? If you got that, is it a minor of all the automata you want to count? If so, there probably is a closed expression in terms of $A^$. – Raphael Jul 28 '11 at 20:12
  • 4
    The smallest automaton that accepts $x$ and $y$ has a single state, so I don't think it's terribly informative... – Aryeh Jul 29 '11 at 13:08
  • 3
    Here is an idea: we only need to know the number of $n$-state DFAs which end up in the same state on $x$ and $y$. Let this number be $m$ and $M$ be the total number of DFAs, i.e. $M=n^{2n}2^{n}$ . Then the answer is $\frac{1}{2}m + \frac{1}{4}(M-m)$, this gives bounds. To compute $m$ another idea is that we can forget about the shared initial segment of $x$ and $y$ and also assume that w.l.o.g. $x=0^a$ and $b=1^b$. We only to count the number of binary DAGs with $l$ states and height at most $\max{a,b}$ that $0^a$ and $1^b$ end up in the same place and from that it is easy to compute $m$. – Kaveh Jul 30 '11 at 11:04
  • @Kaveh: I absolutely agree with the first part of your comment but I do not get the second part. – domotorp Jul 30 '11 at 14:13
  • @domotorp: it is not a complete solution for computing $m$, but Which one is not clear? 1. removing the shared initial part, 2. wlog $x=0^a$ and $y=1^b$. (more) – Kaveh Jul 30 '11 at 14:46
  • I think the unclear part is the last part. If we know the number of binary DAGs with height at most $\max{a,b}$ and $s$ states and $l$ leaves (missed it above) then we will know the number of $n$ state DFAs that extend them, and then we can sum over the values of $s$ and $l$ to get $m$. – Kaveh Jul 30 '11 at 14:50
  • Kaveh is right about $K_n(x,y)=\frac12m+\frac14(M-m)$ -- which means that $\frac14|DFA(n)|\le K_n(x,y)\le\frac12|DFA(n)|$. I don't quite follow the second part, after discarding the initial segment. – Aryeh Jul 30 '11 at 19:31

2 Answers2

1

So the question is pretty brief but very interesting. I suppose that the input is $n$ in unary, and $x$ and $y$ in binary (or we have problems, as pointed out by Kai's answer).

First of all, if you are interested in knowing $K(x,y)$ approximately, then you can just generate a few random DFA's and this will give you (whp) a good approximation. (I wonder if this complexity class has a name.)

Then knowing $K(x,y)$ precisely seems like a tough problem. As pointed out in the comments by a3_nm and Kaveh, the question is equivalent to determining the number of automata for which $x$ and $y$ go to the same state. I will denote the probability that they go to the same state by $p$.

Update: Some of the things I wrote here were not true, now I fixed them.

It is easy to see that $p \ge 1/n$. We have equality, if $x$ is all 0's and $y$ is all zero except for its last bit, which is a 1. Are there other cases? I don't know. If for example $x$ is the empty string and $y=00$, then $p= \frac{n+1}{(n-1)n}$.

To simplify the problem, I even started to think about what happens if $x$ and $y$ are unary. If both are at least $n$ and their difference is divisible by $n!$, then $p=1$. Is there a simple formula for the unary version?

domotorp
  • 13,931
  • 37
  • 93
  • I've clarified the problem -- a $poly(n,|x|,|y|)$ algorithm is desired (or a reduction from some known hard problem). The sampling approximation is employed in the paper where this kernel is introduced:http://portal.acm.org/citation.cfm?id=1577108 – Aryeh Jul 30 '11 at 19:34
  • 2
    As for the unary version: there are only polynomially many $n$-state unary automata, so I would bet that there is a poly-time algorithm for computing $K_n(x,y)$ for this case. – Aryeh Jul 30 '11 at 19:38
  • Indeed, you are absolutely right that the unary version is computable. I still wonder how simple the formula is for a given x and y. – domotorp Jul 30 '11 at 20:20
  • The reduction you have used is buggy: x and y may be accepted by the same automata and end in completely different states, in fact, they may share only the starting state in their paths, which is true for all strings. – amnn Jul 26 '15 at 01:28
  • @amnn: It has been three years since I wrote this, but doesn't the third para of my answer explain why I deal only with ending in the same state? – domotorp Jul 26 '15 at 13:15
  • @domotorp, The third paragraph introduces the reduction, but I don't think it justifies it any way (it merely states an equivalence that is not proven). – amnn Jul 27 '15 at 22:28
  • @amnn: Are you suggesting that there is something wrong with the first part of Kaveh's comment (upvoted many times) where he claims that $K_n(x,y)=m/2+(M-m)/4$? – domotorp Jul 28 '15 at 08:27
  • Well consider a DFA M with states q1, q2, q3, q4. q1 is the start state, q2, q3 are final states, and q4 is a sink state. There is a transition from q1 to q2 on 0 and from q1 to q3 on 1. All other transitions go to q4. Now M recognizes the language {0, 1} but it does not follow that running M on 0 and running M on 1 result in reaching the same state. – amnn Jul 28 '15 at 22:12
  • @amnn: I think that you should finally read Kaveh's comment that I'm referring to and then we can continue our discussion. – domotorp Jul 29 '15 at 08:10
  • If you mean the comment that starts "Here is an idea", then I have read it. It also states the reduction without justification. – amnn Jul 29 '15 at 16:27
  • @amnn: I think it's proof is pretty straightforward. If $x$ and $y$ go to the same state, then there is 50% chance of them being both accepted, while if they go to different states, there is 25% chance. As you can see from the comments, many people agreed with this, so maybe you have misunderstood something (or all of us were wrong - one can never be sure...). – domotorp Jul 29 '15 at 19:28
  • What you say is correct, but for a reduction to be valid, it must work in both ways. I.e. x and y are both accepted by a machine M if and only if x and y reach the same state. Your argument satisfies the <= direction, but not the => which I refuted with my earlier counter-example. Both directions are needed here because otherwise you could be under-counting (x and y reaching the same state is only one way they could both be accepted by the same machine). – amnn Jul 29 '15 at 20:28
  • @amnn: But we have expressed $K_n(x,y)$ with $m$ and $M$, so in my answer I only deal with computing $m$, i.e., when they go to the same state. – domotorp Jul 30 '15 at 07:03
0

I may very well be missing the point but you stated that $n$ is fixed, so all DFAs of that size could be considered precomputed and stored in an easily simulatable format. Compute $K$ as follows:

On input $x$, $y$ where $x,y\in\Sigma^*$

  1. store $x$ and $y$
  2. initialize counter $c$ to $0$
  3. for each of your $n^{2n} 2^n$ DFAs
  4. a. simulate it on both words (this step is $\mathcal{O}(|xy|)$)

    b. increment $c$ if both simulation runs are accepting

  5. output $c$

Altogether, the computation has linear complexity. The answer is quite different for $K(n,x,y)$.

Kai
  • 854
  • 6
  • 12
  • 3
    Clearly trying all machines will work. Aryeh wants to know if there's, perhaps, a polynomial time algorithm or otherwise some hardness result. – Lev Reyzin Jul 30 '11 at 16:02
  • Strictly speaking this is polynomial time in the input, if n is not part of the input, that is what Kai was saying. But the question is clearly different. – domotorp Jul 30 '11 at 16:25
  • 4
    Oh I see. I don't think that's what he means by "fix $n$." I think the natural interpretation of the problem is one that doesn't trivialize it. – Lev Reyzin Jul 30 '11 at 16:59
  • 1
    Right, thanks for pointing out the loophole, Kai. It's been fixed :) – Aryeh Jul 30 '11 at 19:40