To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We illustrate the use of tactical proof (semi)automation on the program shown in Figure 27.1. We compile this source program reverse.c into a C-light abstract-syntax data structure using the front end of CompCert, the clightgen utility. This produces a Coq file reverse.v with a sequence of definitions in the CompCert abstract-syntax tree structures (Figure 27.2).
Clightgen comprises CompCert's (unverified) parser into CompCert C, followed by CompCert's (verified) translation into C light. The fact that one or another of these front-end phases is unverified does not concern us, because we apply the program logic to the output of these translations. If we specify correctness properties of reverse.v and prove them, then the C light program will have those properties, regardless of whether it matches the source program reverse.c. Of course, it is very desirable for reverse.v to match reverse.c, so the programmer may reason informally about unverified properties of reverse.c such as timing, information flow, or resource consumption.
This program uses linked lists of 32-bit integers. Before proceeding with the verification, we should develop the theory of list segments (a “theory” is just what we call a collection of definitions, lemmas, and tactics useful for reasoning about the subject matter). Chapter 19 explained the theory of list segments; in the file list.v we build an lseg theory parameterized by a C structure definition.
Synopsis: In Part III we showed how to apply a program logic interactively to a program, using tactics. Here we will show a different use of program logics: we build automatic static analyses and decision procedures as efficient functional programs, and prove their soundness using the rules of the program logic.
Synopsis: Verifiable C is a style of C programming suited to separation-logic verifications; it is similar to the C light intermediate language of the CompCert compiler. We show the assertion language of separation-logic predicates for specifying states of a C execution. The judgment form semax of the axiomatic semantics relates a C command to its precondition postconditions, and for each kind of command there is an inference rule for proving its semax judgments. We illustrate with the proof of a C program that manipulates linked lists, and we give examples of other programs and how they can be specified in the Verifiable C program logic. Shared-memory concurrent programs with Dijkstra-Hoare synchronization can be verified using the rules of concurrent separation logic.
Synopsis: Instead of reasoning directly on the model (that is, separation algebras), we can treat separation logic as a syntactic formal system, that is, a logic. We can implement proof automation to assist in deriving separationlogic proofs.
Reasoning about recursive functions, recursive types, and recursive predicates can lead to paradox if not done carefully. Step-indexing avoids paradoxes by inducting over the number of remaining program-steps that we care about. Indirection theory is a kind of step-indexing that can serve as models of higher-order Hoare logics. Using indirection theory we can define general (not just covariant) recursive predicates.
Recursive data structures such as lists and trees are easily modeled in indirection theory, but the model is not the same one conventionally used, as it inducts over “age”—the approximation level, the amount of information left in the model—rather than list-length or tree-depth. A tiny pointer/continuation language serves as a case study for separation logic with first-class function-pointers, modeled in indirection theory. The proof of a little program in the case-study language illustrates the application of separation logic with function pointers.
Most presentations of Hoare logics assume that expressions (in a current environment) are interchangeable with their values. Implicit in this presentation is that every expression evaluates to a value in the evaluation relation. This is convenient for users of these logics in accomplishing program verification: connecting a program with a mathematical specification.
Unfortunately, C expressions do not always evaluate to values (and occasionally evaluate to unusable values). Although this occurs only in limited and predictable cases, we do not want to lose the power to reason about expressions and values interchangeably in the many cases where expressions can be statically guaranteed to evaluate. We will avoid the cases where expressions may not evaluate, because we will show that they do not arise in verified programs. We integrate a typechecker with our Hoare-logic rules to detect these cases, and (mostly) restore the link between expressions and values.
CompCert's inductive definition of eval_expr does not assume that expressions always evaluate. CompCert denotes failure to evaluate by omitting tuples from the inductive definition of the compcert.Clight.eval-expr relation, following standard principles of contemporary structural operational semantics. In program verification, however, the cost to using an inductive definition is that in order to relate an expression to a value you must say something like: ∃ν. e ⇓ ν Λ P(ν), “there exists some value such that e evaluates to ν and P holds on ν.”
Many kinds of recursive definitions and recursive predicates appear in the descriptions of programs and programming languages. Some recursive definitions, such as list and tree data structures, are naturally covariant; these are straightforward to handle using a simple least-fixed-point method as described in Chapter 10. But some useful kinds of self-referencing definitions are not covariant. When the recursion goes through function arguments, it may be contravariant (see Ffunopt on page 64) or some mixture that is neither covariant nor contravariant. This kind of recursion requires more difficult mathematics, yet it is essential in reasoning about certain kinds of programs:
• Object-oriented programs in which class C has methods with a “this” or “self” parameter of type C;
• Functional programming languages with mutable references at higher types—such as ML;
• Concurrent languages with dynamically creatable locks whose resource invariants can describe other locks—a typical idiom in Pthreads concurrency;
• Functional languages (such as ML) where datatype recursion can go through function-parameters.
Does the C programming language have these features? Well, yes. C's type system is rather loose (with casts to void* and back). C programs that use void* in design patterns similar to objects or function closures can be perfectly correct, but proving their correctness in a program logic may need noncovariant recursion.
This chapter, and the next two chapters (predicate implication and subtyping; general recursive predicates) present the logical machinery to reason about such recursions in the VST program logics.
The Verified Software Toolchain has many components, put together in a modular way:
msl. The proof theory and semantics of separation logics and indirection theory is independent of any particular programming language, independent of the memory model, independent of particular theories of concurrency.
compcert. The CompCert verified C compiler is independent of any particular program logic (such as separation logic), of any particular theory of concurrency, and of the external-function context (such as an operating system-call setup). CompCert incorporates several programming languages, from C through C light to C minor and then (in various stages) to assembly languages for various target machines. The CompCert family may also include source languages such as C++ or ML. These various operational semantics all use the same memory model, and the same notion of external function call.
sepcomp. The theory of separate compilation explains how to specify the compilation of a programming language that may make shared-memory external function calls, shared-memory calls to an operating system, and shared-memory interaction with other threads. This depends on CompCert's memory model, but not on any particular one of the CompCert languages. Eventually, parts of the sepcomp theory will migrate into CompCert itself.
Some parts of the separate-compilation system concern modular program verifications of modular programs. We may even want to link program modules—and their verifications—written in different languages (C, ML, Java, assembly). This system requires that each language have a program logic that uses the same mpred (memory predicates) modeled using resource maps (rmap).
A program logic is sound when, if you can prove some specification (such as a Hoare triple {P} c {Q}) about a program c, then when c actually executes it will obey that specification.
What does it mean to “actually execute”? If c is written in a source language L, then we can formally specify an operational semantics for L. Then we can give a formal model for the program logic in terms of the operational semantics, and formally prove the soundness of all the inference rules of the logic.
Then one is left trying to believe that the operational semantics accurately characterizes the execution of the language L. But many source languages do not directly execute; they are compiled into lower-level languages or machine language that executes it its own operational semantics. Fortunately, at this point in the 21st century we can rely on formal compiler correctness proofs, that execution in the operational semantics of the source language corresponds to execution in the operational semantics of the machine language. And the machine languages tend to be well specified; machine language is already a formal language, and it is even possible to formally prove that the logic gates of a computer chip correctly implement the instruction set architecure (ISA), that is, the machine language.
So, we prove the program correct using the program logic, we prove the program logic sound with respect to the source-language operational semantics, and prove the compiler correct with respect to the source- and machine-language semantics.
Here we present a simple λ-calculus with references to illustrate the use of indirection theory. The λ-calculus is well understood and its type system presents no surprises, so it provides us as a nice vehicle for explaining how to apply indirection theory.
One reason this language is interesting, from our point of view, is that it was historically rather difficult to find a semantic theory for general references—that is, references that may contain data of any type, including quantified types. In contrast, the theory of references at base types (e.g., only containing integers) is much simpler. Tofte had an syntactic/operational theory of general references as early as 1990 [86], but it was not until the step-indexed model of Ahmed, Appel and Virga [4, 2] in 2003 that a semantic theory of general references was found. The model of Ahmed et al. was refined and generalized in the following years by Appel et al. [11], and then further refined by Hobor et al. [52] into the indirection theory that appears in this book.
The λ-calculus with references is a bit of a detour from our main aim in this book, which is building program logics for C. However, it provides a relatively simple, self-contained example that illustrates the techniques we will be using later in more complicated settings. In particular, we will use indirection theory to build the Hoare tuple for program logics for C along similar lines to how we construct the expression typing predicate in this chapter.
In Part III we described program verification for C: tools and techniques to demonstrate that C programs satisfy correctness properties. What we ultimately want is the correctness of a compiled machine language binary image, running on some target hardware platform. We will use a correct compiler that turns source-level programs satisfying correctness properties into machine-level programs satisfying those same properties. But defining formally the interface between a compiler correctness proof and a program logic has proven to be fraught with difficulties. Resolving these difficulties is still the object of ongoing research. Here we will explore some of the issues that have arisen and report on the current state of the integration effort.
The two issues that have caused the most headaches revolve around understanding and specifying how compiled programs interact with their environment. First, how should we reason about the execution environment when it may behave in unpredictable ways at runtime? In other words, how do we reason about program nondeterminism? Second, how do we specify correctness for programs that exhibit shared memory interactions?
The first question regarding nondeterminism is treated in detail in Dockins's dissertation [38]. Dockins develops a general theory of refinements for nondeterministic programs based on bisimulation methods. This theory gracefully handles the case where the execution environment is nondeterministic, and it has the critical feature that it allows programs to become more defined as they are compiled.
Predicates (of type A → Prop) in type theory give a model for Natural Deduction. A separation algebra gives a model for separation logic. We formalize these statements in Coq.
For a more expressive logic that permits general recursive types and quasi-self-reference, we use step-indexed models built with indirection theory. We will explain this in Part V; for now it suffices to say that indirection theory requires that the type T be ageable—elements of T must contain an approximation index. A given element of the model contains only a finite approximation to some ideal predicate; these approximations become weaker as we “age” them—which we do as the some operational semantics takes its steps.
To enforce that T is ageable we have a typeclass, ageable(T). Furthermore, when Separation is involved, the ageable mechanism must be compatible with the separating conjunction; this requirement is also expressed by a typeclass, Age_alg(T).
Theorem: Separation Algebras serve as a model of Separation Logic.
Proof. We express this theorem in Coq by saying that given type T, the function algNatDed models an instance of NatDed(pred T). Given a SepAlg over T, the function algSepLog models an instance of SepLog(pred T). The definability of algNatDed and algSepLog serve as a proof of the theorem.
What we show in this chapter is the indirection theory version (in the Coq file msl/alg_seplog.v), so ageable and Age-alg are mentioned from time to time.