Amrita Bagchi Amrita Bagchi

Top-down parsing

Consider the following simple language, which derives a subset of the types found in the (now somewhat dated) programming language Pascal. I do not assume you know pascal.

We have two nonterminals, type, which is the start symbol, andsimple, which represents the simple types.

There are 8 terminals, which are tokens produced by the lexer and correspond closely with constructs in pascal itself. Specifically, we have.

  1. integer and char
  2. id for identifier
  3. array and of used in array declarations
  4. ↑ meaning pointer to
  5. num for a (positive whole) number
  6. dotdot for .. (used to give a range like 6..9)

The productions are

    type →   simple
    type →   ↑ id
    type →   array [ simple ] of type
    simple → integer
    simple → char
    simple → num dotdot num
  

Parsing is easy in principle and for certain grammars (e.g., the one above) it actually is easy. We start at the root since this is top-down parsing and apply the two fundamental steps.

  1. At the current (nonterminal) node, select a production whose LHS is this nonterminal and whose RHS matches the input at this point. Make the RHS the children of this node (one child per RHS symbol).
  2. Go to the next node needing a subtree.

When programmed this becomes a procedure for each nonterminal that chooses a production for the node and calls procedures for each nonterminal in the RHS. Thus it is recursive in nature and descends the parse tree. We call these parsers recursive descent.

The big problem is what to do if the current node is the LHS of more than one production. The small problem is what do we mean by the next node needing a subtree.

The easiest solution to the big problem would be to assume that there is only one production having a given terminal as LHS. There are two possibilities

  1. No circularity. For example
    expr → term + term - 9
    term → factor / factor
    factor → digit
    digit → 7
  
  1. But this is very boring. The only possible sentence is 7/7+7/7-9 

  2. Circularity
    expr → term + term
    term → factor / factor
    factor → ( expr )
  
  1. This is even worse; there are no (finite) sentences. Only an infinite sentence beginning (((((((((.

So this won't work. We need to have multiple productions with the same LHS.

How about trying them all? We could do this! If we get stuck where the current tree cannot match the input we are trying to parse, we would backtrack.

Instead, we will look ahead one token in the input and only choose productions that can yield a result starting with this token. Furthermore, we will (in this section) restrict ourselves to predictive parsing in which there is only production that can yield a result starting with a given token. This solution to the big problem also solves the small problem. Since we are trying to match the next token in the input, we must choose the leftmost (nonterminal) node to give children to.



Amrita Bagchi

Amrita Bagchi Creator

(No description available)

Suggested Creators

Amrita Bagchi