We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Binary relations on states are not the only semantic domain for representing sequential, nondeterministic programs. Since Dijkstra published his first paper on weakest preconditions in 1975 ([Dij75]), a rich theory based on monotone functions mapping sets of states to sets of states has emerged. Such functions are called predicate transformers. For instance, one models programs as functions mapping each postcondition to the corresponding weakest precondition. One branch of this theory concentrates on our primary subject, namely, the stepwise refinement of programs, possibly using data refinement techniques [B80, MV94, Mor89a, vW90]. The major drawback of predicate transformer as models of programs is that they are more complicated than binary relations because the domain of predicate transformers is richer than what we intend to model, i.e., not every predicate transformer represents one of our programs. But the predicate transformer approach also has its advantages. Several main results achieved in previous chapters, especially the completeness theorems (Chapters 4 and 9) and the calculation of maximal simulators (Chapters 7 and 9), require rather complicated proofs. The aim of this chapter is to demonstrate that more elegant and succinct proofs for these and also some new results exist, such as, for instance, isomorphism theorems between the relational and predicate transformer world for various forms of correctness and the use of Galois connections.
In this chapter we briefly introduce and discuss three more methods. These are Z, Hehner's method, and Back's Refinement Calculus. We do not intend to describe them in as much detail as Reynolds' method and VDM in the previous two chapters. We concentrate just on the data refinement aspects of these methods and analyze quickly how they compare to the methods already discussed.
All three methods discussed in this chapter turn out to be quite different members of the L-simulation community.
Originally Z was invented as another notation for Zermelo-Fränkel set theory. However, it evolved to a development style (or method) for specifications. Although invented by academics, Z is nowadays relatively popular in industry, especially in Europe. As will turn out at the end of our discussion of Z in Section 13.1, there is not much difference between Z and VDM from the data refinement point of view. The subtle differences between these two methods apart from the notational ones are analyzed elsewhere; see e.g. [HJN93].
Hehner arrives at a strikingly simple syntax-based development method by using first order predicate logic as the specification language [Heh93]. Whereas VDM uses two predicates, namely pre- and postconditions, Hehner needs only a single predicate. Moreover he interprets his predicates in a classical twovalued model similar to ours from Section 5.2 for two sets of variables: input and output variables. As we shall see in Section 13.2, Hehner's notion of data transformer corresponds to a total L-simulation relation combined with the solution to the L-simulation problem given in Section 7.2.
This chapter is based on the fifth chapter of John Reynolds’ book “The craft of programming” [Rey81]. The material in Section 11.2 is taken verbatim from his book.
In contrast to Part I, Reynolds is mainly concerned with top-down development of programs rather than proving refinement between data types. His method of deriving programs is called stepwise refinement and was introduced in [Wir71] and [DDH72]. One of his development techniques, however, is related to data refinement. In this chapter we shall present and analyze this technique and show that it amounts to L-simulation.
In a given program Reynolds inspects each particular variable of some abstract data type separately, and shows how the choice of a way to implement that variable is guided by the number and relative frequency of the operations performed on it. This allows differentiation between the implementation of different variables of the same data type.
Reynolds uses Hoare-style partial correctness specifications. However, none of his program transformation steps increases the domain of possible nontermination. Therefore his examples of refinement are also refinements in a total correctness interpretation.
In Section 11.3 we relate Reynolds' method to L-simulation. At the last stage of our analysis of Reynolds' method we shall see that we have to interpret some of his operations in a total correctness setting to bridge a gap between his requirements and those for partial correctness L-simulation. Formally, this is supported by the L-simulation theorem for total correctness, Theorem 9.9.
We close this chapter with some remarks on the history of this method.
In the previous chapter, our proof techniques for data refinement, namely the notions of simulation introduced in Def. 2.1, have been proven adequate in Theorem 4.17 and sound in Theorem 4.10. This establishes proving simulation as an appropriate technique for verifying data refinement.
In Chapters 9 and 10 we shall encounter other notions of data refinement and simulation defined for frameworks different from that of the binary relations considered until now, for which similar soundness and completeness results will be proven.
In Part II of this monograph a number of established methods for proving data refinement will be similarly analyzed. (These methods are: VDM, Z, and those of Reynolds, Hehner, Back, Abadi and Lamport, and Lynch.) This is done by showing to what extent they are special cases of, or are equivalent to, the previously investigated notions of simulation referred to above, which are the subject of Part I of this monograph.
This justifies considering simulation as a generic term for all these techniques, where the connection with data refinement is made through appropriate soundness and completeness theorems.
Now an immediate consequence of our goal of comparing these simulation methods for proving data refinement is that we must be able to compare semantically expressed methods such as L-simulation and the methods of Abadi and Lamport and of Lynch (see Chapter 14) with syntactically formulated ones such as VDM, Z, and Reynolds' method. This implies immediately that we have to distinguish between syntax and semantics. We bridge this gap by introducing interpretation functions for several classes of expressions, such as arithmetic expressions and predicates built on them (see Section 5.2), programs (see Section 5.3), and relations (see Section 5.4).
In Chapter 1 we saw that abstraction relations, rather than abstraction functions, are the natural concept to formulate proof principles for establishing data refinement, i.e., simulation. This impression was reinforced in Chapter 4 by establishing completeness of the combination of L- and L−1-simulation for proving data refinement. How then is it possible that such an apparently practical method as VDM promotes the use of total abstraction functions instead? Notice that in our set-up such functions are the most restrictive version of abstraction relations, because for them the four versions of simulation are all equivalent. Should this not lead to a serious degree of incompleteness, in that it offers a much weaker proof method than L-simulation, which is already incomplete on its own? As we shall see in this chapter this is not necessarily the case. Combining total abstraction functions with so-called auxiliary variables allows the formulation of proof principles which are equal in power to L- and L−1-simulation. Auxiliary variables are program variables to which assignments are added inside a program not for influencing the flow of control but for achieving greater expressiveness in the formulation of abstraction functions and assertions. Following [AL91] such total abstraction functions are called refinement mappings. The chances for an abstraction relation (from a concrete data type to an abstract data type) to be functional can be increased by artificially inflating the concrete level state space via the introduction of auxiliary variables on that level.
By recording part of the history of a computation in an auxiliary variable, called a history variable, and combining this with refinement mappings, a proof method equivalent to L-simulation is obtained.
During the process of stepwise, hierarchical program development, a step represents a transformation of a so-called abstract higher level result into a more concrete lower level one. In general, this development process corresponds to increasing the amount of detail required for the eventual implementation of the original specification on a given machine.
In the first part of this book we develop the relational theory of simulation and a general version of Hoare logic, show how data refinement can be expressed within this logic, extend these results to total correctness, and show how all this theory can be uniformly expressed inside the refinement calculus of Ralph Back, Paul Gardiner, Carroll Morgan and Joakim von Wright. We develop this theory as a reference point for comparing various existing data refinement methods in the second part, some of which are syntax-based methods. This is one of the main reasons why we are forced to clearly separate syntax from semantics.
The second part of this monograph focuses on the introduction of, and comparison between, various methods for proving correctness of such transformation steps. Although these methods are illustrated mainly by applying them to correctness proofs of implementations of data types, the techniques developed apply equally well to proving correctness of such steps in general, because all these methods are only variations on one central theme: that of proof by simulation, of which we analyze at least 13 different formulations.
Simulation, our main technique for proving data refinement, also works for proving refinement of total correctness between data types based on the semantic model introduced in the previous chapter. However, certain complications arise; for instance, L−1-simulation is unsound in case abstract operations expose infinite nondeterminism, which severely restricts the use of specification statements.
Section 9.1 extends the soundness and completeness results for simulation from Chapter 4 to total correctness. As the main result, we present in Section 9.2 a total correctness version of our L-simulation theorem from Chapter 7.
Simulation
The semantics-based notions of data type, data refinement, and simulation need not be defined anew. The only notion changed is that of observation since, through our total correctness program semantics, nonterminating behaviors have also become observable. It is essential to the understanding of total correctness simulation between data types to realize that, semantically speaking, abstraction relations are directed. In particular, the relational inverse of an abstraction relation from level C to level A is not an abstraction relation in the opposite direction, as is the case for partial correctness. Now it becomes clear why several authors prefer the name downward simulation for L-simulation and upward simulation for L−1-simulation [HHS87]: the direction of an L-simulation relation is downwards, from the abstract to the concrete level, whence a more descriptive name for it would be representation relation or downward simulation relation. For this reason we redefine the meaning of ⊆Lβ such that β itself (and not its inverse) is used in the inclusions characterizing L-simulation.
The definition of data refinement given in the previous chapter requires that an infinite number of inclusions should hold, namely one for every choice of program involved. Consequently it does not yield an effective proof method. In this chapter we define such a method, called simulation, and investigate its soundness and completeness w.r.t. the criterion of proving data refinement. In order to define simulation one needs the concept of abstraction relation, relating concrete data values to abstract ones. We briefly discuss why abstraction relations rather than abstraction functions are used, and how data invariants (characterizing the reachable values in a data type) solve one of the problems associated with converting abstraction relations into abstraction functions. Since ultimately proofs are carried out in predicate logic, this raises the question how to express abstraction relations within that logic. As we shall see, this forces us to distinguish between those variables within a program that are unaffected by the data refinement step in question (these are called normal variables) and those that are affected by that step (called representation variables). This raises a number of technical issues which are discussed and for which we present a solution. Next, two methods presently in use for proving data refinement, namely Reynolds' method and VDM, are briefly introduced by way of an example. Finally, we discuss both the distinction between, and the relative values of, syntax-based methods for proving data refinement and semantically oriented ones.
This chapter is devoted to the semantics of the functional language that we described in the previous chapters. The point of its semantics is to define the meaning of expressions in this language; that is, to define precisely the value of each expression. The association between an expression and its value is created by rewrite rules; that is, rules that transform expressions textually. Those rules are presented and discussed in Section 3.1.
These rewrite rules are non-deterministic. That is, in general, for any expression under consideration, there is more than one rule that may be applied to it. The consistency of these rules rests on the fact that they form a convergent system. In other words, whatever the non-deterministic choices made, at every step it is always possible to make any two different computations converge toward the same expression. This property does not exclude the existence of infinite computations, but it does exclude the possibility of an expression having two distinct values. The value of an expression (when it exists), is therefore unique. We assert this convergence property here, but we will not try to prove it. References about the proof of convergence are found at the end of this chapter.
In practice, in order to implement an evaluator for a language, we have to define a strategy that lets us choose a rewrite at every step—we choose one such rewrite among the set of all possible rewrites.
In this chapter, we show you how to represent exact numbers of arbitrary size. In certain applications, our ability to compute with such numbers is indispensible, especially so in computer algebra. Formal systems of symbolic computation, such as Maple [14], Mathematica [44], or Axiom [19], exploit an exact rational arithmetic. Moreover, programming languages oriented toward symbolic computation generally support exact computations. Such is particularly the case of Caml with the libraries bignum and ratio.
The sets of numbers that we will treat here are the natural numbers (also known as integers for counting), the signed integers (that is, both positive and negative), and the rational numbers. The natural numbers will be represented by the sequence of their digits in a given base. The sequence itself can be represented in various ways. We will represent natural numbers primarily by ordinary lists. This choice is not very efficient because it supports traversal in only one direction. If we decide to put least significant digits at the head of the list, then we can multiply and add fairly efficiently, but division will be inefficient because we must then turn the lists around.
Nevertheless, if we represent natural numbers as lists, then we can program the usual operations simply, and that model can serve later as the point of reference for getting into various other representations, such as representations by doubly linked circular lists or by arrays—representations used in “real” implementations.
This book has a number of objectives. First, it provides the concepts and a language to produce sophisticated software. Second, the book tries to make you step back a bit from programming as an activity by highlighting basic problems linked to programming as a discipline. In the end, we hope to share our own pleasure in programming.
The language we use—Caml—makes it possible to achieve all these goals. Caml belongs to the family of “functional” languages, all of which have the following qualities:
They are particularly well adapted to writing applications for “symbolic computation”—the kind of computing that concerns computer scientists as well as mathematicians—in software engineering, artificial intelligence, formal computation, computer-aided proof, and so forth.
Functional languages are built on a fundamental theory that derives from mathematical logic. This basis provides these languages with their semantics as well as their systems of types and proof.
By the very way in which they are designed, these languages support a certain aesthetic in programming, an aesthetic which, like the aesthetic of a mathematical proof, is often an indication of their quality.
This book grew primarily out of a programming course given by Guy Cousineau at the Ecole Normale Supérieure between 1990 and 1995. The book also benefited from the teaching experience of Michel Mauny, who wrote Chapters 8 and 13 and contributed to the overall consistency of the book.
The spectacular development of the computing industry depends largely on progress in two very different areas: hardware and software. Progress in hardware has been fairly quantitative: miniaturized parts, increased performance, cost cutting; whereas the progress in software has been more qualitative: ease of use, friendliness, etc.
In fact, most users see their computer only through interfaces that let them exploit the machine while ignoring practically all its structure and internal details, just as if we drove our cars without ever opening the hood, just like we enjoy the comfort of central heating without necessarily grasping thermodynamics.
This qualitative improvement was brought to us by progress in software as an independent discipline. It is based on a major research effort, in the course of which computer science has been structured little by little around its own concepts and methods. Those concepts and methods, of course, should be the basis for teaching computer science.
The most fundamental concept in computer science is computing, of course. A computation is a set of transformations carried out “mechanically” by means of a finite number of predefined rules. A computation impinges on formalized symbolic data (information) representing, for example, numbers (as in numeric computations) or mathematical expressions (as in formal computation) or data or even knowledge of all kinds. The only characteristics common to all computations is the discreteness of their data (that is, the information is finite) and the mechanical way in which the rules are applied.
This last part of the book describes techniques to implement a language like Caml. We do not pretend to give a complete description here of an implementation of Caml, but rather a demonstration that such an implementation is feasible. We treat a subset of Caml to show the major difficulties in compilation and type synthesis.
Chapter 11 defines a Caml evaluator in Caml. It highlights the main ideas that make it possible to produce a compiler: the idea of an environment is used to manage variables, and the idea of closure is used to represent functional values.
Chapter 12 tackles two topics simultaneously: compilation schema and techniques of memory management that come into play in the implementation of a functional language. With respect to memory management, only allocation is described precisely. Techniques for recovering memory (that is, garbage collection) are only briefly touched.
The set of machine instructions we use occurs at a relatively abstract level compared to all the instructions available in assembly language, but that set can nevertheless be translated into true machine instructions quite directly.
Chapter 13 describes a type synthesizer. We give you a preliminary version of it in a purely functional style; then we move on to a more efficient one, one that uses a destructive variety of the unification algorithm. This version is quite close to the actual type synthesizer in Caml.