Complexity of a GLR parser on an ambiguous grammar

Question

Let's consider the following expression grammar that is ambiguous:

$E ::= E + E~|~a$

Although GLR parsing (recognition actually, I'm not interested in parse tree creation) is worst case $O(n^3)$, but how does a GLR parser behaves on this grammar that is not worst-case?

I suspect the complexity is $O(n^2)$ or close to it. Can anyone reason about the complexity of this grammar in a generalized setting like GLR that explores all the paths?

score 3 · Accepted Answer · edited Apr 13 '17 at 12:32

This example grammar is worst case ambiguity, unless I do not understand what you mean by worst case ambiguity. Why do you think it should not be? Try to analyse $a+a+a+a+a$.

It's time complexity is $O(n^3)$ with any general CF parser, such as CKY, Earley, GLR, GLL, etc. However, you need only $O(n^2)$ space if you do not keep parse-tree information (though I would have to check how you manage not to keep it in GLR ... it may require minor modification).

But then, if you are not interested in parse-trees, but only in recognition, you should forget ambiguities, and just change it for a regular grammar that recognizes the same language, in linear time.

Reasonning with GLR is a bit of a pain. But you can do it easily enough with CKY. There is no essential difference in principle (see the second part of my answer to a previous question), but reasonning with the more complex construction of GLR is more obfuscated.

Alternatively, take the code of a GLR parser, and add counters in proper places. Then run a few example and you will see that it checks with $O(n^3)$. The last reference of the question I answered actually did that (though I do not recall which general CF parser flavors they tried).

Answering comments, and sketching a proof:

Regarding worst case grammar, you suggest $S \rightarrow SSS \mid SS \mid a$ given in a paper by Mark Johnson. I do not have the paper handy, but my guess is that he uses the $S \rightarrow SSS$ rule to show that the complexity will $O(n^4)$ unless specific steps are taken to binarize this rule (there are actually 2 ways to do it). The reason is that the polynomial complexity exponent can be the length of the longest right hand side (RHS), at least in non terminals. So he deliberately uses a RHS with 3 non-terminals. Your example grammar does not have that problem, and in that sense you can indeed consider that it is not as bad. Johnson's example would be even worse with a RHS of length 4, or more.

I questioned your "not a worst case" statement because, it is easy to bring time complexity down to $O(n^3)$ by binarization of right-hand sides, which is straightforward and changes the trees only in an obvious and very readable way. However, in the current state of the art, you cannot go below $O(n^3)$, unless you use very complex recognition techniques that do not give you parse trees and are based on a reduction to matrix multiplication as shown by Leslie Valiant in 1974. Since then, the theoretical complexity of these multiplication algorithms has gone down fron the $O(n^{2.807})$ of Strassen algorithm to $O(n^{2.373})$ of Williams algorithm, though those faster than Strassen are unusable in practice. It is not known (afaik) whether this bound can be brought down further. These bounds can certainly apply to your grammar, as well as they do to all others, but I would not call it "parsing".

Going back to the more classical parsing algorithms, I guess one can give a common proof based on the following remark: all classical parsing algorithms are based on the (non-deterministic) pushdown automaton (PDA) model (it is not always very visible, but it is always there). The deterministic algorithms, like precedence, LR(k) or LL(k) and their variants are just deterministic PDAs built in some systematic way (when possible), that correspond to a technique using the grammar to walk the (unique) parse-tree as efficiently as possible. These techniques are limited to subclasses of CF grammars and languages.

The general context-free parsing algorithms can handle all CF grammars and languages. They are usually built using the very same techniques, but the consruction accepts that there may be conflicts in the the steps to be applied. These conflicts are simply non-deterministic choices. But the construction is basically the same as in the deterministic case. That means that the resulting non-deterministic PDA can walk all possible parse-trees, sometimes having to walk parse-tree fragments that do not belong to a full parse-tree (for example in unambiguous CF languages that are not deterministic).

So these algorithms walk all parse-trees, independently of whether they actually produce these parse-trees as output. There may be exponentially many (even sometimes infinitely many) such parse trees. The General CF parsers use a tabular (dynamic programming / memoization) technique to share computation, in order to do the work a bit faster. They all use the same tabular technique (from an abstract point of view) and get all the same worst case complexity $O(n^3)$. They may however differ on some example, for example if a grammar falls in the domain wherhe one PDA construction gives a dterministic PDA, while another does not. But they all have $O(n^3)$ time complexity when there are exponentially many parse trees as in your example.

I do not believe I ever encountered a parsing algorithm that does not fall in this schematic description (up to some bells and whistles). This is essentially the basis of the work in the Billot & Lang paper cited above.

Now, as I said, if you are not interested in parse-tree, you can always modify one of these algorithms to do better in some cases. For example, as I said, the rules $S \rightarrow SS \mid a$ could be noticed by the recognizer (no longer parser) construction technique, and replaced by the simpler rules $S \rightarrow aS \mid a$. But then it is no longer the same "parsing" algorithm.

For an analysis of tree sharing into forests and the effect on parsing complexity, you may want to read "Observations on Context Free Parsing" by Beau Sheil In Statistical Methods in Linguistics, 1976:71-109. The point is that all general CF parsing algorithms walk this shared forest completely. And it has size $O(n^3)$ for your example grammar.

Let me clarify things, so maybe you can give a concrete answer. The worst case grammars look like something S ::= SSS | SS | a, which were introduced by Mark Johnson in the paper "the complexity of glr parsing" to show the worst case O(n^k).
I don't question the general case, but just this simple grammar. It may indeed be the worst case, but I need proof, using any algorithm is fine. I tried this grammar using a GLL parser and to me it looks like the number of steps is bounded by O(n^2). However, because of left-recursion termination in GLL is pretty hard for me to analyse it. — Wickoo, May 21 '14 at 00:57
I suggested GLR because I thought maybe it's easier as left-recursion is encoded in the parse table and search space is not affected by that. So I can reformulate this question: can someone theoretically, regardless of any algorithm and using some combinatorial formulas and taking sharing into account tell me what's the upper bound of parsing this grammar? Can you give me reasons in terms of any particular parsing algorithm. — Wickoo, May 21 '14 at 01:02
I still don't see a concrete answer or a sketch of proof here. You are just giving general facts that I'm aware of. You are right about S ::= S S S that it is used to show o(n^k), as i already said myself. My point was that that grammar is much more ambiguous compared to E ::= E + E. So, you're saying that any ambiguous grammar triggers worst case complexity? My question os that this example seems simpler that worst case and my experiments also show that. Can someone prove it? Or show that it is not the case. — Wickoo, May 21 '14 at 10:19
The sketch of a proof is: all general CF algorithms walk the whole forest, because that is what they are built for. Whatever computation sharing they do is forest sharing. Shared forest representation is cubic in general, and always so for exponential ambiguity (which is your case). Hence all those parsers will be cubic on your grammar example. As I said, this is only a sketch, on which an actual proof can be built by checking all the claims. Johnson's example is not much more ambiguous than your: both are exponential and that is all that matters. The fact you ignore parse trees is irrelevant. — babou, May 21 '14 at 11:01
You may be right, but I'm not convinced yet by your argument. How do you define exponential ambiguity? I suggest you run the grammar to see that in practice it doesn't trigger the worst case, but SSS does. — Wickoo, May 21 '14 at 11:35
Exponential ambiguity = exponential number of parse trees. What difference do you see with SSS, compared to SS. - - - - - - - - - - - - - - - One way to get a better bound, while retaining the general approach of these parsing algorithms, would be to find a more compact shared representation of parse-forests that can be walked as well to simulate all parse-tree walks. But even then, this would no longer be one of the classical general CF algorithms. It would be a very interesting result, though ... but I doubt there is anything to be found. — babou, May 21 '14 at 11:43
@Ali I do not have the software (and time) to run examples. but there are some figures in the Billot-Lang paper, though probably not enough to actually show that the parsing is cubic. Look at the appendix C. — babou, May 21 '14 at 11:52
Thanks! Appendix C shows a similar grammar A ::= AA | x named UBDA. I saw a proof in the original Early paper that that this grammar has O(n^3) bound. I redid my experiments, and for very large inputs, 800 a's, I'll get the cubic growth rate. I guess the reason I concluded that it may not be cubic was that its growth was much smaller than SSS, and I ran the experiments for inputs of size 200 which shows the cubic behaviour for SSS but not for E+E, but still they are both worst case cubic. — Wickoo, May 21 '14 at 13:17
@Ali Good you succeeded. How did you do your measurements? If you measure execution time, you always have to be careful with small values as they may hide behind system overhead. On the other hand, large values sometimes trigger garbage collection. There are more formal ways of measuring that avoid these pitfalls, but it is sometimes more work, or delicate to set up accurately. — babou, May 21 '14 at 13:34
I do it in the pretty standard way of measuring the performance of Java programs. Warm up and then average result with throwing out the outliers. I also take the System time instead of nano time to take GC and other noise out (as much as possible). The problem with E + E was it was reaching the n^3 quite late. I basically was doubling the input and measuring if the time was was 8 times bigger, it was much less than 8. The first time I observed 8 was increasing from 400 a's to 800 a's. For the other grammar it is much easier to observe the cubic behaviour and that misleaded me. — Wickoo, May 21 '14 at 17:30
@Ali This is a bit strange. Maybe there is heavy initialization cost. Would you mind sending me the times you got for various values of n, especially those below the 400. Here or at babou(at)inbox.com — babou, May 21 '14 at 17:58
Sure, the numbers are on my workstation at work. I will send them to you tomorrow. — Wickoo, May 21 '14 at 18:48

Complexity of a GLR parser on an ambiguous grammar

1 Answers1