Skip to content

Generated File Contents

realistschuckle edited this page Sep 14, 2010 · 4 revisions

This page attempts to provide detail about the contents that you’ll find in a goll1e-generated parser. Each section below describes, in order, the pieces that you’ll find in the output.

yystype

The yystype block contains the declaration for the value n-tuple of which any one piece a terminal or nonterminal can adopt. Because Go does not have union language structures, the parser uses a struct to describe yystype.

yytable

The yytable variable contains the parser routing table. Assuming that len(t) describes the number of terminals in your grammar and len(nt) describes the number of nonterminals in your grammar, then the size of yytable will equal [len(t) + 1] * len(nt).

The rows in yytable correspond to the nonterminals in your grammar and the columns correspond to terminals.

Entries in the yytable greater than -1 refer to the number of a production defined in the Rules Section of your grammar. These correspond to entries in the yyprods table. Values of -1 in the yytable signify an error in the input to the parser.

yycharmap

The yycharmap provides mappings from character literals listed in the Rules Section of the grammar file to input symbols matched by the parser.

yyMINTOKEN, yyMAXTOKEN, yyUSER, and In-Between

  • yyMINTOKEN provides a value that separates character literals and terminals in the input grammar
  • yyMAXTOKEN provides a value that separates terminals and nonterminals in the input grammar
  • yyUSER provides the a value that separates nonterminals and user-provided input values
  • Constants with a value between yyMINTOKEN and yyMAXTOKEN provide the terminals a specific integral value

yyprods

The yyprods table describes the input literals, terminals, and nonterminals. Entries in the yyprods table have the following meanings.

  • If the entry is less than yyMINTOKEN, then that value represents an entry in the yycharmap
  • If the entry exists in the exclusive interval yyMINTOKEN@, @yyMAXTOKEN, then that value represents a terminal
  • If the entry exceeds the value of yyMAXTOKEN, then that value represents a nonterminal the value of which you can determine by entry value - yyMAXTOKEN

yyrunrule

The yyrunrule function executes the syntax-directed translations contained in the Rules Section of your grammar definition file. Each case in the switch statement corresponds with the order of each production contained in the Rules Section.

yytranslate

The yytranslate function converts provided token values into zero-based values of terminals, nonterminals, and character literals, respectively.

yyparse

The yyparse function does just that, parses the input that it gets from the supplied nextWord function. This differs from the way that YACC works, which will call a global int yylex() function. I like this design better because it allows the use of fewer global variables and allows, at runtime, to switch out implementations of the scanner.

yyparse also expects a value that represents the EOF value. This removes conflicts that can occur between different scanner implementations and their respective EOF value.

yyparse returns a flag that indicates success. If it does succeed, then the second return value will contain the value assigned to the start node after all rule running has occurred.

Variables in yyparse

The following sections dissect the yyparse function and detail the inner workings of it.

yyres vector.Vector

A stack of *yystype*s that contains the resolution stack during the rule execution at the end of yyparse.

inputs vector.IntVector

The inputs variable contains the recognized inputs as they get recognized and resolved through the parser.

values vector.Vector

For terminals, which are the values contained in the input and valued by the *yystype provided the nextWord function, the values variable acts as a stack containing the results. Each time the parser calls newWord it checks the result; if the result lies between yyMINTOKEN and yyMAXTOKEN, the parser puts the *yystype into values.

word int

word contains the most recently read int from the nextWord function.

stack vector.Vector

stack provides the LL(1) its parsing workspace for resolving rules.

tos int

tos represents the value on the top of the stack at any time and exists only for readability.

Actually Parsing Input

In pseudocode, the parsing routine follows the following logic.

loop forever
  if the top of the stack equals the end-of-file value and the value of current word equals the end-of-file value
    set the value of the function to *true*
    exit the loop
  if the top of the stack is either the end-of-file value or a terminal or a character literal
    if the value of the top of the stack equals the value of the current word
      push the current word onto the inputs stack
      pop the resolution stack twice
      get the next word of the input
      push the value of the scan onto the values stack if the word represents a terminal
      set the new top of stack value
    else the top of the stack does not equal the current word
      exit the loop because the parser doesn't consider the input a valid word
  else the top of the stack represents a nonterminal
    translate the top of stack and the word into row and column values used by the parsing routing table
    look up the rule number to apply in the parsing routing table
    if the rule number is -1
      exit the loop because the input cannot match a production rule
    push the translated rule number onto the inputs stack
    pop the resolution stack twice to remove the just used nonterminal
    for each entry in the production rules table for the rule number from last to first
      push the entry onto the resolution stack
    set the new top of stack value

The previous parsing populates the inputs and values stacks with the values needed to run the rules contained in the Rules Section of the grammar definition file. If the parser recognized the provided input, then it runs the rules by adhering to the algorithm as detailed in the following pseudocode.

while the inputs stack has entries
  pop the top entry in the inputs stack
  if the value is a character literal
    push nil onto the resolution stack
  else if the value is a terminal
    pop the top entry from the values stack
    push that value onto the resolution stack
  else if the value is a rule number
    create a new resolution value
    run the code associated with the rule number with the new resolution value
    pop the number of entries in the rule from the resolution stack
    push the new resolution value onto the resolution stack

When the rule running finishes, a single value that represents the completed computation of the grammar will exist as the only entry in the resolution stack. yyparse then uses that value to return a result.