20

The question is simple and direct: For a fixed $n$, how many (different) languages are accepted by a DFA of size $n$ (i.e. $n$ states)? I will formally state this:

Define a DFA as $(Q,\Sigma,\delta,q_0,F)$, where everything is as usual and $\delta:Q\times\Sigma\to Q$ is a (possibly partial) function. We need to establish this since sometimes only total functions are considered valid.

For every $n\geq 1$, define the (equivalence) relation $\sim_n$ on the set of all DFAs as: $\mathcal{A}\sim_n\mathcal{B}$ if $|\mathcal{A}|=|\mathcal{B}|=n$ and $L(\mathcal{A})=L(\mathcal{B})$.

The question is, then: for a given $n$, what is the index of $\sim_n$? That is, what is the size of the set $\{L(\mathcal{A})\mid\mathcal{A}\textrm{ is a DFA of size }n\}$?

Even when $\delta$ is a total function, it doesn't seem to be an easy count (for me, at least). The graph might not be connected, and the states in the connected component containing the initial state might all be accepting, so, for example, there are many graphs of size $n$ accepting $\Sigma^*$. Same with other trivial combinations for the empty language and other languages whose minimal DFA has fewer than $n$ states.

(A naïve) recursion doesn't seem to work either. If we take a DFA of size $k$ and add a new state, then, if we want to keep determinism and make the new graph connected (to try to avoid trivial cases), we have to remove a transition to connect the new state, but in that case we may lose the original language.

Any thoughts?

Note. I updated the question again, with a formal statement and without the previous distracting elements.

Janoma
  • 1,406
  • 11
  • 27
  • Just to clarify: Do you mean "how many different languages can one define using $n$ states?", where a language is defined using $n$ states if there is a DFA with $n$ states that accepts it.

    Also, for the regular expressions, the regex "a*aaaaaa" surely has > 1 concatenations, but the DFA needs only one state (two if you need a separate sink), no?

    – Evgenij Thorstensen Aug 22 '11 at 12:48
  • Apologies: For the regex example, it should be "aaaaa*", as that does allow any number. – Evgenij Thorstensen Aug 22 '11 at 12:57
  • The definition of $c(r)$ appears very related to the notion of "dot-depth", except that concept is normally applied to star-free languages (probably for the reasons that @Evgenij Thorstensen outlined). – mhum Aug 22 '11 at 14:11
  • @EvgenijThorstensen Well, yes, the question is about how many different languages. With respect to your example, I hadn't considered that, but you're definitely correct. I will take that part out of the question, since it's just a distraction. There should be some notion of "minimal regular expression" involved. I will think about it. – Janoma Aug 22 '11 at 14:13
  • @mhum Yep, apparently the problem is in the stars. Will think (more) about it. – Janoma Aug 22 '11 at 14:14
  • Each regular language has a unique minimal DFA that accepts it. I think your question could simply be: "What fraction of DFA of size $n$ are minimal?" – Lev Reyzin Aug 22 '11 at 14:26
  • @lev-reyzin Well, that's an interesting question, but not what I want. I want to know how many languages can be accepted by a DFA of size $n$, minimal or not. – Janoma Aug 22 '11 at 14:32
  • Oh I see; I misread it, sorry. So you want to know number the number of equivalence classes (via their language) of DFA of size n? – Lev Reyzin Aug 22 '11 at 14:38
  • Assuming the relations are $\mathcal{A}\sim_n\mathcal{B}$ if $|\mathcal{A}|=|\mathcal{B}|=n$ and $L(\mathcal{A})=L(\mathcal{B})$, then yes, I want to know the number of equivalence classes of $\sim_n$ for every $n$. That's an elegant way of stating it, but I don't think it makes it easier :-) – Janoma Aug 22 '11 at 15:15
  • 1
    Trivial observation: $n+1$ states can be used to define at least $2^n$ different languages. – Evgenij Thorstensen Aug 22 '11 at 15:41
  • 2
    We can get a little bit more, so $2^{\Omega(n)}$ seems OK. But the number of n-state automata is around $n^{cn}2^n=2^{cn\log n+n} = 2^{\Theta(n\log n)}$ (assuming $|\Sigma|=c$). Can we get $2^{\omega(n)}$? – Kaveh Aug 22 '11 at 21:56
  • See also the related question https://cstheory.stackexchange.com/q/37980/109 – András Salamon Apr 19 '17 at 14:32

1 Answers1

20

I think that this question has been studied previously. Mike Domaratzki wrote a survey on research in this area: "Enumeration of Formal Languages", Bull. EATCS, vol. 89 (June 2006), 113-133: http://www.eatcs.org/images/bulletin/beatcs89.pdf

Hermann Gruber
  • 6,470
  • 1
  • 35
  • 58
  • 4
    The paper does address precisely the question asked, from page 120 and on. It's the function $g_k(n)$, and the paper gives rather tight bounds (close to what Kaveh mentions above) on it, although I have not inhaled all the details. – Evgenij Thorstensen Aug 22 '11 at 23:58
  • 1
    Indeed. What we want, then is $g_k(n)$, or, by the relation given, $f_k(n)$, which is the number of pairwise non-isomorphic minimal DFAs with $n$ states over a $k$-letter alphabet. I haven't looked at it in detail either, but it appears that only bounds are known, not exact quantities. – Janoma Aug 23 '11 at 11:31
  • 6
    And, from the same author, we have the article On the Number of Distinct Languages Accepted by Finite Automata with n States, which even gives explicit computations for $g_1(n)$ ($1\leq n\leq 10$), $g_2(n)$ ($1\leq n\leq 6$) and $g_3(n)$ ($1\leq n\leq 4$). – Janoma Aug 23 '11 at 11:35