Amrita Bagchi Amrita Bagchi

Lexical Analysis (or Scanning)


The character stream input is grouped into meaningful units called lexemes, which are then mapped into tokens, the latter constituting the output of the lexical analyzer. For example, any one of the following

  x3 = y + 3;
  x3  =   y   +   3   ;
  x3   =y+ 3  ;

but not

  x 3 = y + 3;

would be grouped into the lexemes x3, =, y, +, 3, and ;.

A token is a <token-name,attribute-value> pair. For example

  1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for identifier. The value 1 is the index of the entry for x3 in the symbol table produced by the compiler. This table is used to pass information to subsequent phases.
  2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second component is ignored. The point is that there are many different identifiers so we need the second component, but there is only one assignment symbol =.
  3. The lexeme y is mapped to the token <id,2>
  4. The lexeme + is mapped to the token <+>.
  5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters. It is mapped to <number,something>, but what is the something. On the one hand there is only one 3 so we could just use the token <number,3>. However, there can be a difference between how this should be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored. Another possibility is to have a separate numbers table.
  6. The lexeme ; is mapped to the token <;>.

Note that non-significant blanks are normally removed during scanning. In C, most blanks are non-significant. Blanks inside strings are an exception.

Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion 

Amrita Bagchi

Amrita Bagchi Creator

(No description available)

Suggested Creators

Amrita Bagchi