2

I'm trying to compute the FIRST and FOLLOW sets of a CFG (using Python).

To do this I need to recognise terminals and non-terminals. I can (as a human) read them, but I was wondering if there was computational logic for inferring terminals and non-terminals, if one's given only the grammar.

It sounds like one'd need to construct the parse tree and then label terminals and non-terminals? A terminal is a node that has no childs.


One could also use the definition of "terminals" and "non-terminals":

Terminals

A terminal is a symbol which does not appear on the left-hand side of any production.

Nonterminals

Nonterminals are the non-leaf nodes in a parse tree.

Or perhaps non-terminals are also all of those that don't fit into the definition of terminals.

http://web.cs.wpi.edu/~kal/PLT/PLT2.1.2b.html

mavavilj
  • 485
  • 4
  • 13
  • This seems like it depends on the exact syntax you use for your BNF. Personally, I'm in the habit of writing my BNFs with <>s around all non-terminals, and ""s around all terminals/string literals, which makes this issue pretty straightforward (I'd probably have to define a " escape sequence for completeness, but that's about it). Could you describe the exact syntax that you need to be able to parse? – Ixrec Dec 31 '15 at 22:34
  • @Ixrec I'm just looking for a general way for doing this. Could be the kind of labelling that you're mentioning as well. – mavavilj Dec 31 '15 at 22:42
  • I think the general way is just to pick a rigorously precise definition of "non-terminal", which could be "does not appear on the left-hand side of anything" or "is surrounded by double quotes" or something else. As a random example, yacc seems to rely on single quotes. – Ixrec Dec 31 '15 at 22:48

1 Answers1

2

This should be fairly straight-forward. Assuming a grammar like:

expr:   ID | STR | NUM | list
list:   ( seq )  
seq:       | expr seq

You can collect the symbols in each production (ID, STR, NUM, list, (, seq, ), expr, and nothing) and then pull out the production symbols themselves (expr, list, seq). This gives you two sets, the terminals and non-terminals (and nothing, which you may want or not depending on your needs).

Telastyn
  • 109,398