Multi-language DFA minimisation

Question

I'm interested in a slight generalisation of DFA. As usual we have state-set $Q$, finite alphabet $\Sigma$, a $\Sigma^*$-action defined on $Q$ by $\delta : Q\times\Sigma\rightarrow Q$, and initial state $q_0$; but instead of the usual terminal set, we take a family $(T_i)_{i\in 1..n}$ of subsets of $Q$. A Multi-language DFA $M$ is then the tuple

$(Q, \Sigma, \delta, q_0, (T_i))$

and $L \subseteq \Sigma^*$ is recognised by $M$ iff $L = \{s\in\Sigma^*|q_0s\in T_i\}$ for some $i\in 1..n$ . Define $(L_i(M))_{i\in 1..n}$ to be the family of languages recognised by M, if you like.

Okay, now for my question: given a family of regular languages $(L_i)_{i\in 1..n}$ , I want to find the minimal Multi-language DFA $M$ as described above such that $L_i = L_i(M)$ for all $i\in 1..n$ , that is, such that $|Q|$ is minimised over all such machines. My question is, are there any known efficient ways of doing this, perhaps analogous to standard DFA minimisation theory? Conversely, is there any evidence that this problem might be hard?

It seems to me that the standard partition-refinement based algorithm, modified only to start by partitioning the initial set of states by whether they're accepting/nonaccepting for each of the given subsets $T_i$ rather than for a single set $T$, should just work immediately. Why wouldn't it? It only splits pairs of states when they must be split, so it still generates the coarsest possible refinement of the states. — David Eppstein, Feb 15 '13 at 03:06
The proof of the comment by @DavidEppstein is easy if you define the equivalence relation $x\sim y$ iff $x\sim_{T_i}y$ for every $i$, where $x\sim_{T_i} y$ is the Myhill-Nerode equivalence relation. You can then proceed along the same lines as the standard minimization algorithm. — Shaull, Feb 15 '13 at 08:15
dont quite understand. is the answer to this problem finding the minimal DFA of a union of DFAs with same "setup" except different end states, each DFA for $1..n$? also the defn of recognition of $L={...}$ does not seem to make sense exactly, it seems to mixup strings and state sets. — vzn, Feb 15 '13 at 15:58
The points made by DavidEppstein and Shaull look compelling, I'll find some time to go over the Myhill-Nerode theorem when I have time to convince myself that the quotient still yields the minimal automaton. In hindsight it seems too obvious. — gdmclellan, Feb 16 '13 at 00:50
@vzn: definitely don't want to union together the languages of the original automaton; and the $T_i$ may overlap. A multi-language DFA with languages $A$ and $B$ should be able to report, for example, that $s\in A$, but $s\notin B$. As for the notation used in defining recognition of a language, the notation is defined by extending $\delta$ to a $\Sigma^$-action on $Q$ by the following rules for all $q\in Q, \sigma\in\Sigma, s\in\Sigma^$: $q\sigma = \delta(q, \sigma), q(s\sigma) = (qs)\sigma$. — gdmclellan, Feb 16 '13 at 00:50
ok, seems in this theory, concatenation of alphabet symbols is usually a word, not equivalent to a state. equivalently let $\hat{\delta}$ be the extended transition function defined over a string and $\hat{\delta}(q_0 s) \in T_i$ for some $i$ right? also, if the problem is to determine which $T_i$(s) the string "ends up in", thats different than just determining if the string "ends up in" any $T_i$. my idea above is that there is a union of the $L_i$ and a minimal machine for that union. how is this different than what is being asked for? — vzn, Feb 16 '13 at 02:20
@vzn: I could've gone off the deep end and used Eilenberg's division calculus to write $L = q_0^{-1}T_i$ for some $i \in 1..n$. I think using "multiplicative" notation here for concatenation over $\Sigma^$ (which is its operation as a monoid anyway) and also to denote a string acting on a state in $Q$ is coherent. Anyway, you'll notice that for recognition of $L$, I require that "$L = {s\in\Sigma^|q_0s\in T_i}$ for some $i\in 1..n$", as opposed "$L = {s\in\Sigma^*|q_0s\in T_i$ for some $i\in 1..n}$". — gdmclellan, Feb 16 '13 at 23:52
cant quite follow the difference yet. maybe if you cooked up an example that showed the two expression are not equal (but that might take awhile). ok, think my sketch would come up with something "small" that solves the problem but admittedly not provably "minimal". it might help to ponder the cases where it doesnt give the minimal answer. yes, think maybe this can be reduced to some other known minimization problem, but not sure what right now. are multilanguage DFAs considered anywhere in the literature to your knowledge or is it your construction? — vzn, Feb 17 '13 at 02:51
fyi it feels similar to NFA minimization which is Pspace complete... — vzn, Feb 17 '13 at 03:01

score 14 · Accepted Answer · answered Aug 27 '13 at 16:18

Short answer. Given a finite family of regular languages $\mathcal{L} = (L_i)_{1 \leqslant i \leqslant n}$, there is a unique minimal deterministic complete multi-automaton recognizing this family.

Details. The case $n = 1$ corresponds to the standard construction and the general case is not much different in spirit. Given a language $L$ and a word $u$, let $u^{-1}L = \{ v \in A^* \mid uv \in L \}$. Define an equivalence relation $\sim$ on $A^*$ by setting $$ u \sim v \iff \text{for each }L \in \mathcal{L},\ u^{-1}L = v^{-1}L $$ Since the $L_i$ are regular, this congruence has finite index. Further, it is easy to see that each $L_i$ is saturated by $\sim$ and that for each $a \in A$, $u \sim v$ implies $ua \sim va$. Let us denote by $1$ the empty word and by $[u]$ the $\sim$-class of a word $u$. Let $\mathcal{A}_\mathcal{L} = (Q, [1], \cdot, (F_i)_{1 \leqslant i \leqslant n})$ be the deterministic multi-automaton defined as follows:

$Q = \{ [u] \mid u \in A^*\}$,
$[u] \cdot a = [ua]$,
$F_i = \{ [u] \mid u \in L_i\}$.

By construction, $[1] \cdot u \in F_i$ if and only if $u \in L_i$ and hence $\mathcal{A}_\mathcal{L}$ accepts the family $\mathcal{L}$. It remains to prove that $\mathcal{A}_\mathcal{L}$ is minimal. It is actually minimal in a strong algebraic sense (which implies that it has the minimal number of states). Let $\mathcal{A} = (Q, q_-, \cdot, (F_i)_{1 \leqslant i \leqslant n})$ and $\mathcal{A}' = (Q', q'_-, \cdot, (F'_i)_{1 \leqslant i \leqslant n})$ be two multi-automata. A morphism $f: \mathcal{A} \to \mathcal{A}'$ is a surjective map from $Q$ onto $Q'$ such that

$f(q_-) = q'_-$,
for $1 \leqslant i \leqslant n$, $f^{-1}(F'_i) = F_i$,
for all $u \in A^*$ and $q \in Q$, $f(q \cdot u) = f(q) \cdot u$.

Then for any accessible deterministic multi-automaton $\mathcal{A}$ accepting $\mathcal{L}$, there is a morphism from $\mathcal{A}$ onto $\mathcal{A}_\mathcal{L}$. To prove this, one first verifies that if $q_- \cdot u_1 = q_- \cdot u_2 = q$, then $u_1 \sim u_2$. Now $f$ is defined by $f(q) = [u]$ where $u$ is any word such that $q_- \cdot u = q$. Then one can show that $f$ satisfies the three required properties.

The end is a bit sketchy, let me know if you need more details.

Multi-language DFA minimisation

1 Answers1