To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
To triangulate a region is to describe it as the union of a collection of simplices whose interiors are pairwise disjoint. The region is then decomposed into elementary cells of bounded complexity. The words to triangulate and triangulation originate from the two-dimensional problem, but are commonly used in a broader context for regions and simplices of any dimension.
Triangulations and related meshes are ubiquitous in domains where the ambient space needs to be discretized, for instance in order to interpolate functions of several variables, or to numerically solve multi-dimensional differential equations using finite-element methods. Triangulations are largely used in the context of robotics to decompose the free configuration space of a robot, in the context of artificial vision to perform three-dimensional reconstructions of objects from their cross-sections, or in computer graphics to solve problems related to windows or to compute illuminations in rendering an image. Finally, in the context of computational geometry, the triangulation of a set of points, a planar map, a polygon, a polyhedron, an arrangement, or of any other spatial structures, is often a prerequisite to running another algorithm on the data. For instance, this is the case for algorithms performing point location in a planar map by using a hierarchy of triangulations, or for the numerous applications of triangulations to shortest paths and visibility problems.
Triangulations form the topic of the next three chapters. Chapter 11 recalls the basic definitions related to triangulations, and studies the combinatorics of triangulations in dimensions 2 and 3.
Termination is an important property of term rewriting systems. For a finite terminating rewrite system, a normal form of a given term can be found by a simple depth-first search. If the system is also confluent, the normal forms are unique, which makes the word problem for the corresponding equational theory decidable. Unfortunately, as shown in the first section of this chapter, termination is an undecidable property of term rewriting systems. This is true even if one allows for only unary function symbols in the rules, or for only one rewrite rule (but then for function symbols of arity greater than 1). In the restricted case of ground rewrite systems, i.e. rewrite systems whose rules must not contain variables, termination becomes decidable, though. In the second section of this chapter, we introduce the notion of a reduction order. These orders are an important tool for proving termination of rewrite systems. The main problem for a given rewrite system is to find an appropriate reduction order that shows its termination. Thus, it is desirable to have a wide range of different possible reduction orders available. In the third and fourth sections of the chapter, we introduce two different ways of defining reduction orders.
The decision problem
First, we show undecidability of the termination problem for term rewriting systems, and then we consider the decidable subcase of right-ground term rewriting systems (which can be treated by a slight generalization of the well-known proof for ground systems).
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
The purpose of this chapter is twofold. On the one hand, it introduces basic notions from universal algebra (such as terms, substitutions, and identities) on a syntactic level that does not require (or give) much mathematical background. On the other hand, it presents the semantic counterparts of these syntactic notions (such as algebras, homomorphisms, and equational classes), and proves some elementary results on their connections. Most of the definitions and results presented in subsequent chapters can be understood knowing only the syntactic level introduced in Section 3.1. In order to obtain a deeper understanding of the meaning of these results, and of the context in which they are of interest, a study of the other sections in this chapter is recommended, however. For more information on universal algebra see, for example, [100, 55, 173].
Terms, substitutions, and identities
Terms will be built from function symbols and variables in the usual way. For example, if f is a binary function symbol, and x, y are variables, then f(x,y) is a term. To make clear which function symbols are available in a certain context, and which arity they have, one introduces signatures.
Definition 3.1.1 A signature Σ is a set of function symbols, where each f ∈ Σ is associated with a non-negative integer n, the arity of f. For n ≥ 0, we denote the set of all n-ary elements of Σ by Σ(n). The elements of Σ(0) are also called constant symbols.
Chapters 2–11 have described the fundamental components of a good compiler: a front end, which does lexical analysis, parsing, construction of abstract syntax, type-checking, and translation to intermediate code; and a back end, which does instruction selection, dataflow analysis, and register allocation.
What lessons have we learned? I hope that the reader has learned about the algorithms used in different components of a compiler and the interfaces used to connect the components. But the author has also learned quite a bit from the exercise.
My goal was to describe a good compiler that is, to use Einstein's phrase, “as simple as possible – but no simpler.” I will now discuss the thorny issues that arose in designing Tiger and its compiler.
Nested functions. Tiger has nested functions, requiring some mechanism (such as static links) for implementing access to nonlocal variables. But many programming languages in widespread use −C, C++, Java – do not have nested functions or static links. The Tiger compiler would become simpler without nested functions, for then variables would not escape, and the FindEscape phase would be unnecessary. But there are two reasons for explaining how to compile nonlocal variables. First, there are programming languages where nested functions are extremely useful – these are the functional languages described in Chapter 15.
Over the past decade, there have been several shifts in the way compilers are built. New kinds of programming languages are being used: object-oriented languages with dynamic methods, functional languages with nested scope and first-class function closures; and many of these languages require garbage collection. New machines have large register sets and a high penalty for memory access, and can often run much faster with compiler assistance in scheduling instructions and managing instructions and data for cache locality.
This book is intended as a textbook for a one- or two-semester course in compilers. Students will see the theory behind different components of a compiler, the programming techniques used to put the theory into practice, and the interfaces used to modularize the compiler. To make the interfaces and programming examples clear and concrete, I have written them in the ML programming language. Other editions of this book are available that use the C and Java languages.
Implementation project. The “student project compiler” that I have outlined is reasonably simple, but is organized to demonstrate some important techniques that are now in common use: abstract syntax trees to avoid tangling syntax and semantics, separation of instruction selection from register allocation, copy propagation to give flexibility to earlier phases of the compiler, and containment of target-machine dependencies. Unlike many “student compilers” found in textbooks, this one has a simple but sophisticated back end, allowing good register allocation to be done after instruction selection.
A compiler was originally a program that “compiled” subroutines [a link-loader]. When in 1954 the combination “algebraic compiler” came into use, or rather into misuse, the meaning of the term had already shifted into the present one.
Bauer and Eickel [1975]
This book describes techniques, data structures, and algorithms for translating programming languages into executable code. A modern compiler is often organized into many phases, each operating on a different abstract “language.” The chapters of this book follow the organization of a compiler, each covering a successive phase.
To illustrate the issues in compiling real programming languages, I show how to compile Tiger, a simple but nontrivial language of the Algol family, with nested scope and heap-allocated records. Programming exercises in each chapter call for the implementation of the corresponding phase; a student who implements all the phases described in Part I of the book will have a working compiler. Tiger is easily modified to be functional or object-oriented (or both), and exercises in Part II show how to do this. Other chapters in Part II cover advanced techniques in program optimization. Appendix A describes the Tiger language.
The interfaces between modules of the compiler are almost as important as the algorithms inside the modules. To describe the interfaces concretely, it is useful to write them down in a real programming language. This book uses the C programming language.
lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
Webster's Dictionary
To translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis.
The analysis is usually broken up into
Lexical analysis: breaking the input into individual words or “tokens”;
Syntax analysis: parsing the phrase structure of the program; and
Semantic analysis: calculating the program's meaning.
The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.
Lexical analysis is not very complicated, but we will attack it with high-powered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many applications in areas other than compilation.
LEXICAL TOKENS
A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.
Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage. The memory occupied by garbage should be reclaimed for use in allocating new records. This process is called garbage collection, and is performed not by the compiler but by the runtime system (the support programs linked with the compiled code).
Ideally, we would say that any record that is not dynamically live (will not be used in the future of the computation) is garbage. But, as Section 10.1 explains, it is not always possible to know whether a variable is live. So we will use a conservative approximation: we will require the compiler to guarantee that any live record is reachable; we will ask the compiler to minimize the number of reachable records that are not live; and we will preserve all reachable records, even if some of them might not be live.
Figure 13.1 shows a Tiger program ready to undergo garbage collection (at the point marked garbage-collect here). There are only three program variables in scope: p, q, and r.
MARK-AND-SWEEP COLLECTION
Program variables and heap-allocated records form a directed graph. The variables are roots of this graph. A node n is reachable if there is a path of directed edges r → … → n starting at some root r. A graph-search algorithm such as depth-first search (Algorithm 13.2) can mark all the reachable nodes.
reg-is-ter: a device for storing small amounts of data
al-lo-cate: to apportion for a specific purpose
Webster's Dictionary
The Translate, Canon, and Codegen phases of the compiler assume that there are an infinite number of registers to hold temporary values and that move instructions cost nothing. The job of the register allocator is to assign the many temporaries to a small number of machine registers, and, where possible, to assign the source and destination of a move to the same register so that the move can be deleted.
From an examination of the control and dataflow graph, we derive an interference graph. Each node in the inteference graph represents a temporary value; each edge (t1, t2) indicates a pair of temporaries that cannot be assigned to the same register. The most common reason for an interference edge is that t1 and t2 are live at the same time. Interference edges can also express other constraints; for example, if a certain instruction a ← b ⊕ c cannot produce results in register r12 on our machine, we can make a interfere with r12.
Next we color the interference graph. We want to use as few colors as possible, but no pair of nodes connected by an edge may be assigned the same color. Graph coloring problems derive from the old mapmakers' rule that adjacent countries on a map should be colored with different colors.
mem-o-ry: a device in which information can be inserted and stored and from which it may be extracted when wanted
hi-er-ar-chy: a graded or ranked series
Webster's Dictionary
An idealized random access memory (RAM) has N words indexed by integers such that any word can be fetched or stored – using its integer address – equally quickly. Hardware designers can make a big slow memory, or a small fast memory, but a big fast memory is prohibitively expensive. Also, one thing that speeds up access to memory is its nearness to the processor, and a big memory must have some parts far from the processor no matter how much money might be thrown at the problem.
Almost as good as a big fast memory is the combination of a small fast cache memory and a big slow main memory; the program keeps its frequently used data in cache and the rarely used data in main memory, and when it enters a phase in which datum x will be frequently used it may move x from the slow memory to the fast memory.
It's inconvenient for the programmer to manage multiple memories, so the hardware does it automatically. Whenever the processor wants the datum at address x, it looks first in the cache, and – we hope – usually finds it there.
lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
Webster's Dictionary
To translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis.
The analysis is usually broken up into
Lexical analysis: breaking the input into individual words or “tokens”;
Syntax analysis: parsing the phrase structure of the program; and
Semantic analysis: calculating the program's meaning.
The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.
Lexical analysis is not very complicated, but we will attack it with high powered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many applications in areas other than compilation.
LEXICAL TOKENS
A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.