We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
The goal of this and subsequent chapters is to introduce the algorithmic methods that are used most frequently to solve geometric problems. Generally speaking, computational geometry has recourse to all of the classical algorithmic techniques. Readers examining all the algorithms described in this book from a methodological point of view will distinguish essentially three methods: the incremental method, the divide-and-conquer method, and the sweep method.
The incremental method is perhaps the method which is the most largely emphasized in the book. It is also the most natural method, since it consists of processing the input to the problem one item at a time. The algorithm initiates the process by solving the problem for a small subset of the input, then maintains the solution to the problem as the remaining data are inserted one by one. In some cases, the algorithm may initially sort the input, in order to take advantage of the fact that the data are sorted. In other cases, the order in which the data are processed is indifferent, sometimes even deliberately random. In the latter case, we are dealing with the randomized incremental method, which will be stated and analyzed at length in chapter 5. We therefore will not expand further on the incremental method in this chapter.
The divide-and-conquer method is one of the oldest methods for the design of algorithms, and its use goes well beyond geometry. In computational geometry, this method leads to very efficient algorithms for certain problems.
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
To triangulate a region is to describe it as the union of a collection of simplices whose interiors are pairwise disjoint. The region is then decomposed into elementary cells of bounded complexity. The words to triangulate and triangulation originate from the two-dimensional problem, but are commonly used in a broader context for regions and simplices of any dimension.
Triangulations and related meshes are ubiquitous in domains where the ambient space needs to be discretized, for instance in order to interpolate functions of several variables, or to numerically solve multi-dimensional differential equations using finite-element methods. Triangulations are largely used in the context of robotics to decompose the free configuration space of a robot, in the context of artificial vision to perform three-dimensional reconstructions of objects from their cross-sections, or in computer graphics to solve problems related to windows or to compute illuminations in rendering an image. Finally, in the context of computational geometry, the triangulation of a set of points, a planar map, a polygon, a polyhedron, an arrangement, or of any other spatial structures, is often a prerequisite to running another algorithm on the data. For instance, this is the case for algorithms performing point location in a planar map by using a hierarchy of triangulations, or for the numerous applications of triangulations to shortest paths and visibility problems.
Triangulations form the topic of the next three chapters. Chapter 11 recalls the basic definitions related to triangulations, and studies the combinatorics of triangulations in dimensions 2 and 3.
Termination is an important property of term rewriting systems. For a finite terminating rewrite system, a normal form of a given term can be found by a simple depth-first search. If the system is also confluent, the normal forms are unique, which makes the word problem for the corresponding equational theory decidable. Unfortunately, as shown in the first section of this chapter, termination is an undecidable property of term rewriting systems. This is true even if one allows for only unary function symbols in the rules, or for only one rewrite rule (but then for function symbols of arity greater than 1). In the restricted case of ground rewrite systems, i.e. rewrite systems whose rules must not contain variables, termination becomes decidable, though. In the second section of this chapter, we introduce the notion of a reduction order. These orders are an important tool for proving termination of rewrite systems. The main problem for a given rewrite system is to find an appropriate reduction order that shows its termination. Thus, it is desirable to have a wide range of different possible reduction orders available. In the third and fourth sections of the chapter, we introduce two different ways of defining reduction orders.
The decision problem
First, we show undecidability of the termination problem for term rewriting systems, and then we consider the decidable subcase of right-ground term rewriting systems (which can be treated by a slight generalization of the well-known proof for ground systems).
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
Jean-Daniel Boissonnat, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt,Mariette Yvinec, Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt
The purpose of this chapter is twofold. On the one hand, it introduces basic notions from universal algebra (such as terms, substitutions, and identities) on a syntactic level that does not require (or give) much mathematical background. On the other hand, it presents the semantic counterparts of these syntactic notions (such as algebras, homomorphisms, and equational classes), and proves some elementary results on their connections. Most of the definitions and results presented in subsequent chapters can be understood knowing only the syntactic level introduced in Section 3.1. In order to obtain a deeper understanding of the meaning of these results, and of the context in which they are of interest, a study of the other sections in this chapter is recommended, however. For more information on universal algebra see, for example, [100, 55, 173].
Terms, substitutions, and identities
Terms will be built from function symbols and variables in the usual way. For example, if f is a binary function symbol, and x, y are variables, then f(x,y) is a term. To make clear which function symbols are available in a certain context, and which arity they have, one introduces signatures.
Definition 3.1.1 A signature Σ is a set of function symbols, where each f ∈ Σ is associated with a non-negative integer n, the arity of f. For n ≥ 0, we denote the set of all n-ary elements of Σ by Σ(n). The elements of Σ(0) are also called constant symbols.
Chapters 2–11 have described the fundamental components of a good compiler: a front end, which does lexical analysis, parsing, construction of abstract syntax, type-checking, and translation to intermediate code; and a back end, which does instruction selection, dataflow analysis, and register allocation.
What lessons have we learned? I hope that the reader has learned about the algorithms used in different components of a compiler and the interfaces used to connect the components. But the author has also learned quite a bit from the exercise.
My goal was to describe a good compiler that is, to use Einstein's phrase, “as simple as possible – but no simpler.” I will now discuss the thorny issues that arose in designing Tiger and its compiler.
Nested functions. Tiger has nested functions, requiring some mechanism (such as static links) for implementing access to nonlocal variables. But many programming languages in widespread use −C, C++, Java – do not have nested functions or static links. The Tiger compiler would become simpler without nested functions, for then variables would not escape, and the FindEscape phase would be unnecessary. But there are two reasons for explaining how to compile nonlocal variables. First, there are programming languages where nested functions are extremely useful – these are the functional languages described in Chapter 15.
Over the past decade, there have been several shifts in the way compilers are built. New kinds of programming languages are being used: object-oriented languages with dynamic methods, functional languages with nested scope and first-class function closures; and many of these languages require garbage collection. New machines have large register sets and a high penalty for memory access, and can often run much faster with compiler assistance in scheduling instructions and managing instructions and data for cache locality.
This book is intended as a textbook for a one- or two-semester course in compilers. Students will see the theory behind different components of a compiler, the programming techniques used to put the theory into practice, and the interfaces used to modularize the compiler. To make the interfaces and programming examples clear and concrete, I have written them in the ML programming language. Other editions of this book are available that use the C and Java languages.
Implementation project. The “student project compiler” that I have outlined is reasonably simple, but is organized to demonstrate some important techniques that are now in common use: abstract syntax trees to avoid tangling syntax and semantics, separation of instruction selection from register allocation, copy propagation to give flexibility to earlier phases of the compiler, and containment of target-machine dependencies. Unlike many “student compilers” found in textbooks, this one has a simple but sophisticated back end, allowing good register allocation to be done after instruction selection.
A compiler was originally a program that “compiled” subroutines [a link-loader]. When in 1954 the combination “algebraic compiler” came into use, or rather into misuse, the meaning of the term had already shifted into the present one.
Bauer and Eickel [1975]
This book describes techniques, data structures, and algorithms for translating programming languages into executable code. A modern compiler is often organized into many phases, each operating on a different abstract “language.” The chapters of this book follow the organization of a compiler, each covering a successive phase.
To illustrate the issues in compiling real programming languages, I show how to compile Tiger, a simple but nontrivial language of the Algol family, with nested scope and heap-allocated records. Programming exercises in each chapter call for the implementation of the corresponding phase; a student who implements all the phases described in Part I of the book will have a working compiler. Tiger is easily modified to be functional or object-oriented (or both), and exercises in Part II show how to do this. Other chapters in Part II cover advanced techniques in program optimization. Appendix A describes the Tiger language.
The interfaces between modules of the compiler are almost as important as the algorithms inside the modules. To describe the interfaces concretely, it is useful to write them down in a real programming language. This book uses the C programming language.
lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
Webster's Dictionary
To translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis.
The analysis is usually broken up into
Lexical analysis: breaking the input into individual words or “tokens”;
Syntax analysis: parsing the phrase structure of the program; and
Semantic analysis: calculating the program's meaning.
The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.
Lexical analysis is not very complicated, but we will attack it with high-powered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many applications in areas other than compilation.
LEXICAL TOKENS
A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.
Heap-allocated records that are not reachable by any chain of pointers from program variables are garbage. The memory occupied by garbage should be reclaimed for use in allocating new records. This process is called garbage collection, and is performed not by the compiler but by the runtime system (the support programs linked with the compiled code).
Ideally, we would say that any record that is not dynamically live (will not be used in the future of the computation) is garbage. But, as Section 10.1 explains, it is not always possible to know whether a variable is live. So we will use a conservative approximation: we will require the compiler to guarantee that any live record is reachable; we will ask the compiler to minimize the number of reachable records that are not live; and we will preserve all reachable records, even if some of them might not be live.
Figure 13.1 shows a Tiger program ready to undergo garbage collection (at the point marked garbage-collect here). There are only three program variables in scope: p, q, and r.
MARK-AND-SWEEP COLLECTION
Program variables and heap-allocated records form a directed graph. The variables are roots of this graph. A node n is reachable if there is a path of directed edges r → … → n starting at some root r. A graph-search algorithm such as depth-first search (Algorithm 13.2) can mark all the reachable nodes.