Known algorithms to go from a DFA to a regular expression

Question

I was wondering whether there is a ``better'' (I will explain in what sense) algorithm to start from a DFA $\mathcal{A}$ and construct a regular expression $r$ such that $L(\mathcal{A})=L(r)$, than the one in the book by Hopcroft and Ullman (1979). In there, the sets $R_{ij}^k$ are used to represent sets of strings that take the DFA from state $q_i$ to $q_j$ without going through any state numbered higher than $k$. This construction, although obviously correct and very useful, is rather technical.

I'm writing a monograph about algebraic automata theory and I don't want to distract my audience with too many technical details (at least not with details that are irrelevant for the results I want to show), but I do want to include a proof of the equivalence between DFA and regular expressions for the sake of completeness. For the record, I'm using Glushkov automata to go from a regular expression to a DFA. It seemed more intuitive than $\varepsilon$-transitions, which I didn't define at all (again, because I don't need them).

What other algorithms are known to go from a DFA to a regular expression? I value simplicity over efficiency (that's ``better'' for me in this case), but that is not a requirement.

Thanks in advance for your help!

It's not a different algorithm, but the $R^k_{ij}$ algorithm can be expressed algebraically, using the $k$th power of a matrix of regular expressions in the appropriate algebra. Perhaps you'll find this more elegant/concise. I'm looking for a reference. — Max, Dec 01 '11 at 07:44
The $R^k_{ij}$ algorithm is essentially a variant of the Floyd-Warshall algorithm for the All-pairs-shortest-path problem, so you may find the presentation in terms of matrix multiplication by searching for these keywords. — Jan Johannsen, Dec 01 '11 at 08:31
I concur. It is basically Floyd-Warshall algorithm. It can also be derived it using standard dynamic programming techniques (just like Floyd-Warshall can). — david, Dec 01 '11 at 10:29
I am sure I answered a question like this before, but I can't find it. — Raphael, Dec 01 '11 at 11:18
@Max could you find a reference? I'm interested in the matrix representation, it should be more appealing to algebrists actually. — Janoma, Dec 01 '11 at 13:56
@RaduGRIGore I will take a closer look at it at some point, but the method with Arden's Lemma seems the best for what I want. — Janoma, Dec 03 '11 at 14:41
@Janoma: Radu GRIGore's reference was essentially what I was thinking of ... it seems to not exactly be matrix multiplication, but in fact Gaussian elimination. — Max, Dec 05 '11 at 17:40

Sylvain · Accepted Answer · 2011-12-01T22:08:52.720

19

Two more constructions: Brzozowski-McCluskey aka state elimination [1], and Gaussian elimination in a system of equations using Arden's Lemma. The best source on these is probably Jacques Sakarovitch's book [2].

[1] J. Brzozowski, E. McCluskey Jr., Signal ﬂow graph techniques for sequential circuit state diagrams, IEEE Transactions on Electronic Computers EC-12 (1963) 67–76.

[2] J. Sakarovitch, Elements of Automata Theory. Cambridge University Press, 2009.

edited Dec 01 '11 at 22:08

answered Dec 01 '11 at 07:55

Sylvain

3,374
26
22

3

I find the approach of solving equations using Arden's Lemma the simplest and easiest to explain, that's why I present it that way in an introductory theory class. – Jan Johannsen Dec 01 '11 at 08:27
The method of a system of equations sounds brilliant. Unfortunately, the library of my university doesn't have the book you mention (Sakarovitch), but I'm going to look elsewhere. – Janoma Dec 01 '11 at 13:53
4

The comparison of constructions is also found in Sakarovitch's paper "The Language, the Expression and the (small) Automaton", CIAA 2005, LNCS 3845, Springer (2006) 15-30. See http://www.infres.enst.fr/~jsaka/PUB/Files/LESA.pdf – Hermann Gruber Dec 04 '11 at 00:37
2

Also, note that the ordering in which the states are processed can greatly affect the size of the resulting regular expression. This holds always true: whether you do it with Arden's lemma, McNaughton-Yamada, state elimination, or another variant. Several simple heuristics for choosing a good elimination ordering are available. – Hermann Gruber Dec 04 '11 at 00:49

mikero · Answer 2 · 2011-12-01T21:04:50.590

Kozen's book "Automata & Computability" mentions an elegant generalization of this Floyd-Warshall algorithm. Since you mentioned appealing to algebraists, you might find it useful. You'll find it on page 58-59 of that text. (I think google books has a preview.)

Basically, you can define a Kleene algebra on matrices whose entries are from a Kleene algebra. Addition/union of matrices is coordinate-wise addition. Multiplication/concatenation of matrices is just like regular matrix multiplication. Kleene star for $2 \times 2$ matrices is defined as:

$\begin{bmatrix} a & b \\ c & d \end{bmatrix}^* = \begin{bmatrix} (a+bd^*c)^* & (a+bd^*c)^*bd^* \\ (d+ca^*b)^*ca^* & (d+ca^*b)^* \end{bmatrix} $

You can see that if the left-hand matrix is the transition matrix of a 2-state DFA, then the $i,j$-entry of right-hand matrix describes the set of paths (of any length) from state $i$ to state $j$.

Then Kleene star of larger matrices is defined recursively: divide the $n \times n$ matrix into 4 quadrants/submatrices $a,b,c,d$, of dimensions $m\times m$, $m\times (n-m)$, $(n-m) \times m$, and $(n-m) \times (n-m)$, and apply the $2 \times 2$ rule above now with the matrix minors instead of "scalar" entries. (Analogously to how regular matrix multiplication can be defined recursively based on the rule for $2 \times 2$.)

So if you have an $n$-state NFA and its corresponding transition matrix $T$. Then an equivalent regular expression is $\sum_{f \in F} (T^*)_{s,f}$, where $s$ is the start state. $T^*$ can be evaluated recursively using the definition above.

Kozen claims that the case where you evaluate the matrix-star recursively using $m=1$ corresponds to the $R_{ij}^k$ algorithm.

Another derivation of the Kleene algebra structures over matrices appears in A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events by Kozen.

score 14 · Answer 3 · answered Dec 03 '11 at 13:58

14

By far the nicest procedure I have seen is the one mentioned by Sylvain. In particular, it seems to yield more concise expressions than others.

I wrote this document explaining the method for students last summer. It directly relates to a specific lecture; the reference mentioned is typical definition of regular expressions. A proof of Arden's Lemma is contained; one for correctness of the method is missing. As I learned of it in lecture I don't have a reference, sadly.

answered Dec 03 '11 at 13:58

Raphael

4,565
28
48

I also prefer that proof. I find it elegant and easy to explain. Even Arden's Lemma is not hard. I think this will be the method I will include in my document. – Janoma Dec 03 '11 at 14:38
@Raphael Saw the example in cs.stackexchange and found that there is some difference between the notations. You have used both $+$ and $\cup$ in document, but you have used only $\cup$ in the above document. I think both meant the same thing. For the consistency of notation you could consider changing that. Anyway that document saved my day :). Thanks for writing documents like this and releasing under open license. – user3162 Feb 28 '13 at 00:28

Known algorithms to go from a DFA to a regular expression

3 Answers3