Can a nondeterministic finite automata (NDFA) be efficiently converted to a deterministic finite automata (DFA) in subexponential space/time?

Question

Twenty years ago, I built an regular expression package that included conversions from regular expressions to a finite state machine (DFA) and supported a host of closed regular expression operations (Kleene star, concatenation, reverse, set operations, etc). I was unsure about the worst case performance of my package.

A DFA has the same expressive power as an NDFA, because an n-state NDFA can be trivially converted to a DFA having 2^n states. However, are there any lower upper bound guarantees for such a conversion that do not require an exponential explosion in state?

I was unable to come up with examples mal-behaving regular expressions or NDFAs, but I didn't spend much time thinking about it. I am guessing a regular expression like ((((e|A|B|C)*(e|D|E|F))*(e|G|H|I))*(e|J|K|L|M))* which mixes a lot of alternations and Kleene stars would have a linearly sized NDFA but an expansive DFA.

Are there any restrictions on the class of NFAs that you would want to accept as input? Some restrictions lead to better upper bounds. — András Salamon, Aug 17 '10 at 01:52
Yes there are restrictions. The NFAs are constructed directly from regular expressions by treating them as generalized transition graphs. http://www.seas.upenn.edu/~cit596/notes/dave/regexp-nfa4.html — Wesner Moise, Aug 18 '10 at 16:37

score 16 · Accepted Answer · answered Aug 17 '10 at 01:21

16

It is known that for every pair of naturals numbers n,a such that n <= a <= 2^n, there exists a minimal NDFA with n states whose corresponding equivalent minimal dfa has a states (over a four letter alphabet).

See the paper here: Deterministic blow-ups of minimal nondeterministic finite automata over a fixed alphabet.

Abstract of the paper:

We show that for all integers n and α such that n ≤ α ≤ 2^n, there exists a minimal nondeterministic finite automaton of n states with a four-letter input alphabet whose equivalent minimal deterministic finite automaton has exactly a states. It follows that in the case of a four-letter alphabet, there are no "magic numbers", i.e., the holes in the hierarchy. This improves a similar result obtained by Geffert for a growing alphabet of size n + 2 (Proc. 7th DCFS, Como, Italy, 23-37).

So, I suppose the answer to your question is, no.

answered Aug 17 '10 at 01:21

Aryabhata

1,855
3
21
29

the question is asking for an "algorithm" running in subexponential time and space for converting the NFA. – Marcos Villagra Aug 17 '10 at 01:39
@Marcos: If your output is exponential, you cannot possibly have an algorithm that runs in sub-exponential time. – Aryabhata Aug 17 '10 at 01:40
1

This is a general result. If there are known restrictions on the class of input NFAs, then it might be possible to do better. – András Salamon Aug 17 '10 at 01:50
@Andras: Agreed, but given that this is probably programming related (which will support Kleen * etc), I doubt if the set of input NFA will be restricted to a proper subset. – Aryabhata Aug 17 '10 at 01:55
5

This result was recently strengthened to use a three letter alphabet, and the constructions are a bit simpler: http://portal.acm.org/citation.cfm?id=1575787.1575813&coll=Portal&dl=GUIDE&CFID=98498868&CFTOKEN=60579493 – Aug 18 '10 at 05:11

score 14 · Answer 2 · answered Aug 17 '10 at 17:14

14

The classic example for a language with an exponential separation between DFA size and NFA size is the following finite language: binary strings of length exactly 2n in which the first half is not equal to the second half. A NFA would guess an index i in which the first and second half disagree. A lower bound for a DFA follows from communication complexity, for example.

answered Aug 17 '10 at 17:14

Noam

9,369
47
58

Another maybe simpler example is "the n-th character before the end of the word is an $a$". Can be done with $n+1$ states with an NFA that guesses when the end of the word occurs, whereas a DFA clearly needs to have $2^n$ states as each of the possible states of the $n$ last characters define a different set of continuations that make the automaton accept. – a3nm Nov 09 '21 at 18:33

Alexandre Passos · Answer 3 · 2010-08-17T01:34:06.180

8

The minimum DFA corresponding to an NFA has 2^n states in the worst case, so you can't garantee anything. Without having a constructive example, the reasoning is that in an NFA you can be at any arbitrary subset of states after reading a certain input string, and each such subset might behave differently when observing one character. Suppose a language with two characters in the alphabet (a and b), and an NFA N with n states that starts with an accepting state at s_0. Now enumerate all subsets of states of N, and build the transition table such that observing "a" from subset S_i takes you to subset S_i+1 and observing b takes you to subset S_i-1 (this is doable for some enumerations, I think). Now this automata has n states and accepts sequences of m a's and n b's such that m-n = 0 mod 2^|N|, and cannot be expressed with a DFA that has less than 2^|N| states (since it might need to cycle through all subsets of states of the NFA N).

edited Aug 17 '10 at 01:34

answered Aug 17 '10 at 01:28

Alexandre Passos

1,459
12
20

Can this be turned into an argument that says "if (some bad thing) is avoided in the NFA, then the DFA has subexponential number of states"? – András Salamon Aug 17 '10 at 01:54
1

@András, yes. "If nondeterminism is avoided in the NFA, then the DFA has subexponential number of states". – P Shved Aug 17 '10 at 07:12
2

Pavel, yes, obviously. Is there any non-trivial property that can be recognized efficiently which also guarantees subexponential blowup? – András Salamon Aug 17 '10 at 17:21

Can a nondeterministic finite automata (NDFA) be efficiently converted to a deterministic finite automata (DFA) in subexponential space/time?

3 Answers3