30

I was wondering whether there is a ``better'' (I will explain in what sense) algorithm to start from a DFA $\mathcal{A}$ and construct a regular expression $r$ such that $L(\mathcal{A})=L(r)$, than the one in the book by Hopcroft and Ullman (1979). In there, the sets $R_{ij}^k$ are used to represent sets of strings that take the DFA from state $q_i$ to $q_j$ without going through any state numbered higher than $k$. This construction, although obviously correct and very useful, is rather technical.

I'm writing a monograph about algebraic automata theory and I don't want to distract my audience with too many technical details (at least not with details that are irrelevant for the results I want to show), but I do want to include a proof of the equivalence between DFA and regular expressions for the sake of completeness. For the record, I'm using Glushkov automata to go from a regular expression to a DFA. It seemed more intuitive than $\varepsilon$-transitions, which I didn't define at all (again, because I don't need them).

What other algorithms are known to go from a DFA to a regular expression? I value simplicity over efficiency (that's ``better'' for me in this case), but that is not a requirement.

Thanks in advance for your help!

Hermann Gruber
  • 6,470
  • 1
  • 35
  • 58
Janoma
  • 1,406
  • 11
  • 27
  • 1
    It's not a different algorithm, but the $R^k_{ij}$ algorithm can be expressed algebraically, using the $k$th power of a matrix of regular expressions in the appropriate algebra. Perhaps you'll find this more elegant/concise. I'm looking for a reference. – Max Dec 01 '11 at 07:44
  • 1
    The $R^k_{ij}$ algorithm is essentially a variant of the Floyd-Warshall algorithm for the All-pairs-shortest-path problem, so you may find the presentation in terms of matrix multiplication by searching for these keywords. – Jan Johannsen Dec 01 '11 at 08:31
  • 2
    I concur. It is basically Floyd-Warshall algorithm. It can also be derived it using standard dynamic programming techniques (just like Floyd-Warshall can). – david Dec 01 '11 at 10:29
  • I am sure I answered a question like this before, but I can't find it. – Raphael Dec 01 '11 at 11:18
  • @Max could you find a reference? I'm interested in the matrix representation, it should be more appealing to algebrists actually. – Janoma Dec 01 '11 at 13:56
  • Do you know of http://boole.stanford.edu/pub/am4.pdf ? – Radu GRIGore Dec 01 '11 at 21:44
  • @RaduGRIGore I will take a closer look at it at some point, but the method with Arden's Lemma seems the best for what I want. – Janoma Dec 03 '11 at 14:41
  • @Janoma: Radu GRIGore's reference was essentially what I was thinking of ... it seems to not exactly be matrix multiplication, but in fact Gaussian elimination. – Max Dec 05 '11 at 17:40

3 Answers3

19

Two more constructions: Brzozowski-McCluskey aka state elimination [1], and Gaussian elimination in a system of equations using Arden's Lemma. The best source on these is probably Jacques Sakarovitch's book [2].

[1] J. Brzozowski, E. McCluskey Jr., Signal flow graph techniques for sequential circuit state diagrams, IEEE Transactions on Electronic Computers EC-12 (1963) 67–76.

[2] J. Sakarovitch, Elements of Automata Theory. Cambridge University Press, 2009.

Sylvain
  • 3,374
  • 26
  • 22
  • 3
    I find the approach of solving equations using Arden's Lemma the simplest and easiest to explain, that's why I present it that way in an introductory theory class. – Jan Johannsen Dec 01 '11 at 08:27
  • The method of a system of equations sounds brilliant. Unfortunately, the library of my university doesn't have the book you mention (Sakarovitch), but I'm going to look elsewhere. – Janoma Dec 01 '11 at 13:53
  • 4
    The comparison of constructions is also found in Sakarovitch's paper "The Language, the Expression and the (small) Automaton", CIAA 2005, LNCS 3845, Springer (2006) 15-30. See http://www.infres.enst.fr/~jsaka/PUB/Files/LESA.pdf – Hermann Gruber Dec 04 '11 at 00:37
  • 2
    Also, note that the ordering in which the states are processed can greatly affect the size of the resulting regular expression. This holds always true: whether you do it with Arden's lemma, McNaughton-Yamada, state elimination, or another variant. Several simple heuristics for choosing a good elimination ordering are available. – Hermann Gruber Dec 04 '11 at 00:49
16

Kozen's book "Automata & Computability" mentions an elegant generalization of this Floyd-Warshall algorithm. Since you mentioned appealing to algebraists, you might find it useful. You'll find it on page 58-59 of that text. (I think google books has a preview.)

Basically, you can define a Kleene algebra on matrices whose entries are from a Kleene algebra. Addition/union of matrices is coordinate-wise addition. Multiplication/concatenation of matrices is just like regular matrix multiplication. Kleene star for $2 \times 2$ matrices is defined as:

$\begin{bmatrix} a & b \\ c & d \end{bmatrix}^* = \begin{bmatrix} (a+bd^*c)^* & (a+bd^*c)^*bd^* \\ (d+ca^*b)^*ca^* & (d+ca^*b)^* \end{bmatrix} $

You can see that if the left-hand matrix is the transition matrix of a 2-state DFA, then the $i,j$-entry of right-hand matrix describes the set of paths (of any length) from state $i$ to state $j$.

Then Kleene star of larger matrices is defined recursively: divide the $n \times n$ matrix into 4 quadrants/submatrices $a,b,c,d$, of dimensions $m\times m$, $m\times (n-m)$, $(n-m) \times m$, and $(n-m) \times (n-m)$, and apply the $2 \times 2$ rule above now with the matrix minors instead of "scalar" entries. (Analogously to how regular matrix multiplication can be defined recursively based on the rule for $2 \times 2$.)

So if you have an $n$-state NFA and its corresponding transition matrix $T$. Then an equivalent regular expression is $\sum_{f \in F} (T^*)_{s,f}$, where $s$ is the start state. $T^*$ can be evaluated recursively using the definition above.

Kozen claims that the case where you evaluate the matrix-star recursively using $m=1$ corresponds to the $R_{ij}^k$ algorithm.

Another derivation of the Kleene algebra structures over matrices appears in A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events by Kozen.

mikero
  • 2,799
  • 22
  • 23
14

By far the nicest procedure I have seen is the one mentioned by Sylvain. In particular, it seems to yield more concise expressions than others.

I wrote this document explaining the method for students last summer. It directly relates to a specific lecture; the reference mentioned is typical definition of regular expressions. A proof of Arden's Lemma is contained; one for correctness of the method is missing. As I learned of it in lecture I don't have a reference, sadly.

Raphael
  • 4,565
  • 28
  • 48
  • I also prefer that proof. I find it elegant and easy to explain. Even Arden's Lemma is not hard. I think this will be the method I will include in my document. – Janoma Dec 03 '11 at 14:38
  • @Raphael Saw the example in cs.stackexchange and found that there is some difference between the notations. You have used both $+$ and $\cup$ in document, but you have used only $\cup$ in the above document. I think both meant the same thing. For the consistency of notation you could consider changing that. Anyway that document saved my day :). Thanks for writing documents like this and releasing under open license. – user3162 Feb 28 '13 at 00:28