The Structure of a Compiler


Modern compilers contain two (large) parts, each of which is often subdivided. These two parts are the front end, shown in green on the right and the back end, shown in pink.

The front end analyzes the source program, determines its constituent parts, and constructs an intermediate representation of the program. Typically the front end is independent of the target language.

The back end synthesizes the target program from the intermediate representation produced by the front end. Typically the back end is independent of the source language.

This front/back division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages. Instead of NM compilers, we need N front ends and M back ends. For gcc (originally standing for Gnu C Compiler, but now standing for Gnu Compiler Collection), N=7 and M~30 so the savings is considerable.


Multiple Phases


The front and back end are themselves each divided into multiple phases. The input to each phase is the output of the previous. Sometime a phase changes the representation of the input. For example, the lexical analyzer converts a character stream input into a token stream output. Sometimes the representation is unchanged. For example, the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code.

The diagram is definitely not drawn to scale, in terms of effort or lines of code. In practice the optimizers, especially the machine-dependent one, dominate.

Conceptually, there are three phases of analysis with the output of one phase the input of the next. The phases are called lexical analysis or scanningsyntax analysisor parsing, and semantic analysis.

Amrita Bagchi

Amrita Bagchi Creator

(No description available)

Suggested Creators

Amrita Bagchi