To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
ca-non-i-cal: reduced to the simplest or clearest schema possible
Webster's Dictionary
The trees generated by the semantic analysis phase must be translated into assembly or machine language. The operators of the Tree language are chosen carefully to match the capabilities of most machines. However, there are certain aspects of the tree language that do not correspond exactly with machine languages, and some aspects of the Tree language interfere with compile-time optimization analyses.
For example, it's useful to be able to evaluate the subexpressions of an expression in any order. But the subexpressions of Tree.exp can contain side effects – eseq and call nodes that contain assignment statements and perform input/output. If tree expressions did not contain eseq and call nodes, then the order of evaluation would not matter.
Some of the mismatches between Trees and machine-language programs are:
The cjump instruction can jump to either of two labels, but real machines' conditional jump instructions fall through to the next instruction if the condition is false.
eseq nodes within expressions are inconvenient, because they make different orders of evaluating subtrees yield different results.
call nodes within expressions cause the same problem.
call nodes within the argument-expressions of other call nodes will cause problems when trying to put arguments into a fixed set of formal-parameter registers.
Why does the Tree language allow eseq and two-way cjump, if they are so troublesome?
The front end of the compiler translates programs into an intermediate language with an unbounded number of temporaries. This program must run on a machine with a bounded number of registers. Two temporaries a and b can fit into the same register, if a and b are never “in use” at the same time. Thus, many temporaries can fit in few registers; if they don't all fit, the excess temporaries can be kept in memory.
Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporaries are in use at the same time. We say a variable is live if it holds a value that may be needed in the future, so this analysis is called liveness analysis.
To perform analyses on a program, it is often useful to make a control-flow graph. Each statement in the program is a node in the flow graph; if statement x can be followed by statement y, there is an edge from x to y. Graph 10.1 shows the flow graph for a simple loop.
Let us consider the liveness of each variable (Figure 10.2). A variable is live if its current value will be used in the future, so we analyze liveness by working from the future to the past. Variable b is used in statement 4, so b is live on the 3 → 4 edge.
func-tion: a mathematical correspondence that assigns exactly one element of one set to each element of the same or another set
Webster's Dictionary
The mathematical notion of function is that if f(x) = a “this time,” then f(x) = a “next time”; there is no other value equal to f(x). This allows the use of equational reasoning familiar from algebra: that if a = f(x) then g(f(x), f(x)) is equivalent to g(a, a). Pure functional programming languages encourage a kind of programming in which equational reasoning works, as it does in mathematics.
Imperative programming languages have similar syntax: a ← f(x). But if we follow this by b ← f(x) there is no guarantee that a = b; the function f can have side effects on global variables that make it return a different value each time. Furthermore, a program might assign into variable x between calls to f(x), so f(x) really means a different thing each time.
Higher-order functions. Functional programming languages also allow functions to be passed as arguments to other functions, or returned as results. Functions that take functional arguments are called higher-order functions.
Higher-order functions become particularly interesting if the language also supports nested functions with lexical scope (also called block structure). As in Tiger, lexical scope means that each function can refer to variables and parameters of any function in which it is nested.
sched-ule: a procedural plan that indicates the time and sequence of each operation
Webster's Dictionary
A simple computer can process one instruction at a time. First it fetches the instruction, then decodes it into opcode and operand specifiers, then reads the operands from the register bank (or memory), then performs the arithmetic denoted by the opcode, then writes the result back to the register back (or memory); and then fetches the next instruction.
Modern computers can execute parts of many different instructions at the same time. At the same time the processor is writing results of two instructions back to registers, it may be doing arithmetic for three other instructions, reading operands for two more instructions, decoding four others, and fetching yet another four. Meanwhile, there may be five instructions delayed, awaiting the results of memory-fetches.
Such a processor usually fetches instructions from a single flow of control; it's not that several programs are running in parallel, but the adjacent instructions of a single program are decoded and executed simultaneously. This is called instruction-level parallelism (ILP), and is the basis for much of the astounding advance in processor speed in the last decade of the twentieth century.
A pipelined machine performs the write-back of one instruction in the same cycle as the arithmetic “execute” of the next instruction and the operand-read of the previous one, and so on.
func-tion: a mathematical correspondence that assigns exactly one element of one set to each element of the same or another set
Webster's Dictionary
The mathematical notion of function is that if f(x) = a “this time,” then f(x) = a “next time”; there is no other value equal to f(x). This allows the use of equational reasoning familiar from algebra: that if a = f(x) then g(f(x), f(x)) is equivalent to g(a, a). Pure functional programming languages encourage a kind of programming in which equational reasoning works, as it does in mathematics.
Imperative programming languages have similar syntax: a ← f(x). But if we follow this by b ← f(x) there is no guarantee that a = b; the function f can have side effects on global variables that make it return a different value each time. Furthermore, a program might assign into variable x between calls to f(x), so f(x) really means a different thing each time.
Higher-order functions. Functional programming languages also allow functions to be passed as arguments to other functions, or returned as results. Functions that take functional arguments are called higher-order functions.
Higher-order functions become particularly interesting if the language also supports nested functions with lexical scope (also called block structure). As in Tiger, lexical scope means that each function can refer to variables and parameters of any function in which it is nested. A higher-order functional language is one with nested scope and higher-order functions.
dom-i-nate: to exert the supreme determining or guiding influence on
Webster's Dictionary
Many dataflow analyses need to find the use-sites of each defined variable or the definition-sites of each variable used in an expression. The def-use chain is a data structure that makes this efficient: for each statement in the flow graph, the compiler can keep a list of pointers to all the use sites of variables defined there, and a list of pointers to all definition sites of the variables used there. In this way the compiler can hop quickly from use to definition to use to definition.
An improvement on the idea of def-use chains is static single-assignment form, or SSA form, an intermediate representation in which each variable has only one definition in the program text. The one (static) definition-site may be in a loop that is executed many (dynamic) times, thus the name static singleassignment form instead of single-assignment form (in which variables are never redefined at all).
The SSA form is useful for several reasons:
Dataflow analysis and optimization algorithms can be made simpler when each variable has only one definition.
If a variable has N uses and M definitions (which occupy about N + M instructions in a program), it takes space (and time) proportional to N · M to represent def-use chains – a quadratic blowup (see Exercise 19.8). For almost all realistic programs, the size of the SSA form is linear in the size of the original program (but see Exercise 19.9).
Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage. The memory occupied by garbage should be reclaimed for use in allocating new records. This process is called garbage collection, and is performed not by the compiler but by the runtime system (the support programs linked with the compiled code).
Ideally, we would say that any record that is not dynamically live (will not be used in the future of the computation) is garbage. But, as Section 10.1 explains, it is not always possible to know whether a variable is live. So we will use a conservative approximation: we will require the compiler to guarantee that any live record is reachable; we will ask the compiler to minimize the number of reachable records that are not live; and we will preserve all reachable records, even if some of them might not be live.
Figure 13.1 shows a Tiger program ready to undergo garbage collection (at the point marked garbage-collect here). There are only three program variables in scope: p, q, and r.
MARK-AND-SWEEP COLLECTION
Program variables and heap-allocated records form a directed graph. The variables are roots of this graph. A node n is reachable if there is a path of directed edges r → ··· → n starting at some root r.
A compiler was originally a program that “compiled” subroutines [a link-loader]. When in 1954 the combination “algebraic compiler” came into use, or rather into misuse, the meaning of the term had already shifted into the present one.
Bauer and Eickel [1975]
This book describes techniques, data structures, and algorithms for translating programming languages into executable code. A modern compiler is often organized into many phases, each operating on a different abstract “language.” The chapters of this book follow the organization of a compiler, each covering a successive phase.
To illustrate the issues in compiling real programming languages, I show how to compile Tiger, a simple but nontrivial language of the Algol family, with nested scope and heap-allocated records. Programming exercises in each chapter call for the implementation of the corresponding phase; a student who implements all the phases described in Part I of the book will have a working compiler. Tiger is easily modified to be functional or object-oriented (or both), and exercises in Part II show how to do this. Other chapters in Part II cover advanced techniques in program optimization. Appendix A describes the Tiger language.
The interfaces between modules of the compiler are almost as important as the algorithms inside the modules. To describe the interfaces concretely, it is useful to write them down in a real programming language. This book uses ML – a strict, statically typed functional programming language with modular structure.
ab-stract: disassociated from any specific instance
Webster's Dictionary
A compiler must do more than recognize whether a sentence belongs to the language of a grammar – it must do something useful with that sentence. The semantic actions of a parser can do useful things with the phrases that are parsed.
In a recursive-descent parser, semantic action code is interspersed with the control flow of the parsing actions. In a parser specified in Yacc, semantic actions are fragments of C program code attached to grammar productions.
SEMANTIC ACTIONS
Each terminal and nonterminal may be associated with its own type of semantic value. For example, in a simple calculator using Grammar 3.35, the type associated with exp and INT might be int; the other tokens would not need to carry a value. The type associated with a token must, of course, match the type that the lexer returns with that token.
For a rule A → B C D, the semantic action must return a value whose type is the one associated with the nonterminal A. But it can build this value from the values associated with the matched terminals and nonterminals B, C, D.
RECURSIVE DESCENT
In a recursive-descent parser, the semantic actions are the values returned by parsing functions, or the side effects of those functions, or both.
loop: a series of instructions that is repeated until a terminating condition is reached
Webster's Dictionary
Loops are pervasive in computer programs, and a great proportion of the execution time of a typical program is spent in one loop or another. Hence it is worthwhile devising optimizations to make loops go faster. Intuitively, a loop is a sequence of instructions that ends by jumping back to the beginning. But to be able to optimize loops effectively we will use a more precise definition.
A loop in a control-flow graph is a set of nodes S including a header node h with the following properties:
From any node in S there is a path of directed edges leading to h.
There is a path of directed edges from h to any node in S.
There is no edge from any node outside S to any node in S other than h.
Thus, the dictionary definition (from Webster's) is not the same as the technical definition.
Figure 18.1 shows some loops. A loop entry node is one with some predecessor outside the loop; a loop exit node is one with a successor outside the loop. Figures 18.1c, 18.1d, and 18.1e illustrate that a loop may have multiple exits, but may have only one entry. Figures 18.1e and 18.1f contain nested loops.
loop: a series of instructions that is repeated until a terminating condition is reached
Webster's Dictionary
Loops are pervasive in computer programs, and a great proportion of the execution time of a typical program is spent in one loop or another. Hence it is worthwhile devising optimizations to make loops go faster. Intuitively, a loop is a sequence of instructions that ends by jumping back to the beginning. But to be able to optimize loops effectively we will use a more precise definition.
A loop in a control-flow graph is a set of nodes S including a header node h with the following properties:
From any node in S there is a path of directed edges leading to h.
There is a path of directed edges from h to any node in S.
There is no edge from any node outside S to any node in S other than h.
Thus, the dictionary definition (from Webster's) is not the same as the technical definition.
Figure 18.1 shows some loops. A loop entry node is one with some predecessor outside the loop; a loop exit node is one with a successor outside the loop. Figures 18.1c, 18.1d, and 18.1e illustrate that a loop may have multiple exits, but may have only one entry. Figures 18.1e and 18.1f contain nested loops.
REDUCIBLE FLOW GRAPHS
A reducible flow graph is one in which the dictionary definition of loop corresponds more closely to the technical definition; but let us develop a more precise definition.
sched-ule: a procedural plan that indicates the time and sequence of each operation
Webster's Dictionary
A simple computer can process one instruction at a time. First it fetches the instruction, then decodes it into opcode and operand specifiers, then reads the operands from the register bank (or memory), then performs the arithmetic denoted by the opcode, then writes the result back to the register back (or memory); and then fetches the next instruction.
Modern computers can execute parts of many different instructions at the same time. At the same time the processor is writing results of two instructions back to registers, it may be doing arithmetic for three other instructions, reading operands for two more instructions, decoding four others, and fetching yet another four. Meanwhile, there may be five instructions delayed, awaiting the results of memory-fetches.
Such a processor usually fetches instructions from a single flow of control; it's not that several programs are running in parallel, but the adjacent instructions of a single program are decoded and executed simultaneously. This is called instruction-level parallelism (ILP), and is the basis for much of the astounding advance in processor speed in the last decade of the twentieth century.
A pipelined machine performs the write-back of one instruction in the same cycle as the arithmetic “execute” of the next instruction and the operandread of the previous one, and so on. A very-long-instruction-word (VLIW) issues several instructions in the same processor cycle; the compiler must ensure that they are not data-dependent on each other.
anal-y-sis: an examination of a complex, its elements, and their relations
Webster's Dictionary
An optimizing compiler transforms programs to improve their efficiency without changing their output. There are many transformations that improve efficiency:
Register allocation: Keep two nonoverlapping temporaries in the same register.
Common-subexpression elimimination: If an expression is computed more than once, eliminate one of the computations.
Dead-code elimination: Delete a computation whose result will never be used.
Constant folding: If the operands of an expression are constants, do the computation at compile time.
This is not a complete list of optimizations. In fact, there can never be a complete list.
NO MAGIC BULLET
Computability theory shows that it will always be possible to invent new optimizing transformations.
Let us say that a fully optimizing compiler is one that transforms each program P to a program Opt(P) that is the smallest program with the same input/output behavior as P. We could also imagine optimizing for speed instead of program size, but let us choose size to simplify the discussion.
For any program Q that produces no output and never halts, Opt(Q) is short and easily recognizable:
Therefore, if we had a fully optimizing compiler we could use it to solve the halting problem; to see if there exists an input on which P halts, just see if Opt(P) is the one-line infinite loop. But we know that no computable algorithm can always tell whether programs halt, so a fully optimizing compiler cannot be written either.
ca-non-i-cal: reduced to the simplest or clearest schema possible
Webster's Dictionary
The trees generated by the semantic analysis phase must be translated into assembly or machine language. The operators of the Tree language are chosen carefully to match the capabilities of most machines. However, there are certain aspects of the tree language that do not correspond exactly with machine languages, and some aspects of the Tree language interfere with compiletime optimization analyses.
For example, it's useful to be able to evaluate the subexpressions of an expression in any order. But the subexpressions of Tree.exp can contain side effects – eseq and call nodes that contain assignment statements and perform input/output. If tree expressions did not contain eseq and call nodes, then the order of evaluation would not matter.
Some of the mismatches between Trees and machine-language programs are:
The cjump instruction can jump to either of two labels, but real machines' conditional jump instructions fall through to the next instruction if the condition is false.
eseq nodes within expressions are inconvenient, because they make different orders of evaluating subtrees yield different results.
call nodes within expressions cause the same problem.
call nodes within the argument-expressions of other call nodes will cause problems when trying to put arguments into a fixed set of formal-parameter registers.
Why does the Tree language allow eseq and two-way cjump, if they are so troublesome? Because they make it much more convenient for the Translate (translation to intermediate code) phase of the compiler.