To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Synopsis: Indirection theory gives a clean interface to higher-order step indexing. Many different semantic features of programming languages can be modeled in indirection theory. The models of indirection theory use dependent types to stratify quasirecursive predicates, thus avoiding paradoxes of self-reference. Lambda calculus with mutable references serves as a case study to illustrate the use of indirection theory models.
When defining both Indirection and Separation one must take extra care to ensure that aging commutes over separation. We demonstrate how to build an axiomatic semantics with using higher-order separation logic, for the pointer/continuation language introduced in the case study of Part II.
In a conventional separation logic we have a “maps-to” operator a ↦ b saying that the heap contains (exactly) one cell at address a containing value b. This operator in the separation logic corresponds to the load and store operators of the operational semantics.
Now consider two more operators of an operational semantics: function call and function definition. When function names are static and global, we can simply have a global table relating functions to their specifications—where a specification gives the function's precondition and postcondition. But when the address of a function can be kept in a variable, we want local specifications of function-pointer variables, and ideally these local specifications should be as modular as the rest of our separation logic. For example, they should satisfy the frame rule. That is, we might like to write assertions such as (a ↦ b) * (f : {P}{Q}) meaning that a is a memory location containing value b, and a different address f is a function with precondition P and postcondition Q. Furthermore, the separation * guarantees that storing to a will not overwrite the body of f.
To illustrate these ideas in practice, we will consider a tiny programming language called Cont. The functions in this language take parameters f(x, y, z) but they do not return; thus, they are continuations.
Indirection theory is a powerful technique for using step-indexing in modeling higher-order features of programming languages. Rmaps (Chapters 39 and 40), which figure prominently in our model of Verifiable C, rely heavily on indirection theory to express self-reference.
When reasoning in a program logic, step indexes are unproblematic: the step indexes can often be hidden via use of the ▹ operator, and therefore do not often appear explicitly in assertions. Indirection theory provides a generic method for constructing the underlying step-indexed models.
More problematic is how to connect a step-indexed program logic like Verifiable C to a certified compiler such as CompCert. CompCert's model of state is not step-indexed, nor would it be reasonable to make CompCert step-indexed. To do so introduces unnecessary complication into CompCert's correctness proofs. It also complicates the statement of CompCert's correctness theorem: naively requiring the compiler to preserve all step indexes through compilation makes it difficult to reason about optimizations that change the number of steps.
Previous chapters of this book outlined one way in which this difficulty can be resolved, by stratifying our models into two layers: operational states corresponding to states of the operational semantics used by CompCert, and semantic worlds appearing in assertions of the program logic. Chapter 40 in particular gave some motivation for why this stratification makes sense: We may not want all the information found in operational states to be visible to Hoare logic assertions (in particular, control state should be hidden).
CompCert defines a formal small-step operational semantics for every intermediate language (including C light) between C and assembly language.
For Verifiable C, we use the C light syntax with an alternate (non-standard) operational semantics of C light (file veric/Clight_new.v). Our nonstandard semantics is quite similar to the standard, but it makes fewer “administrative small-steps” and is (in this and other ways) more conducive to our program-logic soundness proof. We prove a simulation relation from our alternate semantics to the CompCert standard C light semantics. This ensures that the soundness we prove relative to the alternate semantics is also valid with respect to the standard semantics.
In our operational semantics, an operational state is either internal, when the compiled program is about to execute an ordinary instruction of a single C thread; or external, when the program is requesting a system call or other externally visible event.
An internal operational state contains:
genv Global environment, mapping identifiers to addresses of global (extern) variables and functions, and a separate mapping of function-addresses to function-bodies.
ve Variable environment, mapping identifiers to addresses of addressable local variables—those to which the C-language & (address-of) operator is somewhere applied.
te Temp environment, mapping identifiers to values of ordinary local variables—those to which & is never applied.
k Continuation, representing the stack of control and data (including the program counter, return addresses for function calls, and local variables of suspended functions).
Program logics for certified compilers: We prove that the program logic is sound with respect to the operational semantics of a source language—meaning that if the program logic proves some claim about the observable behavior of a program, then the source program actually respects that claim when interpreted in the source-language semantics. But computers don't directly execute source-language semantics: we also need a proof about the correctness of an interpreter or a compiler.
CompCert (compilateur certifié in French) is a formally verified optimizing compiler for the C language, translating to assembly language for various machines (Intel x86, ARM, PowerPC) [62]. Like most optimizing compilers, it translates in several sequential phases through a sequence of intermediate languages. Unlike most compilers, each of these intermediate languages has a formal specification written down in Coq as an operational semantics. Each phase is proved correct: the form of the proof is a simulation theorem expressing that the observable behavior of the target program corresponds to the observable behavior of the source program. The composition of all these per-phase simulation theorems gives the compiler correctness theorem.
Although there had been formally verified compilers before [67, 36, 61, 60], CompCert is an important breakthrough for several reasons:
Language: One of Leroy's goals has been that CompCert should be able to compile real high-assurance embedded C programs, such as the avionics software for a commercial jetliner. Such software is not trivially modified: any tweak to the software—let alone rewriting it in another language—requires months or years of rebuilding an assurance case.
An exciting development of the 21st century is that the 20th-century vision of mechanized program verification is finally becoming practical, thanks to 30 years of advances in logic, programming-language theory, proof-assistant software, decision procedures for theorem proving, and even Moore's law which gives us everyday computers powerful enough to run all this software.
We can write functional programs in ML-like languages and prove them correct in expressive higher-order logics; and we can write imperative programs in C-like languages and prove them correct in appropriately chosen program logics. We can even prove the correctness of the verification toolchain itself: the compiler, the program logic, automatic static analyzers, concurrency primitives (and their interaction with the compiler). There will be few places for bugs (or security vulnerabilities) to hide.
This book explains how to construct powerful and expressive program logics based on separation logic and Indirection Theory. It is accompanied by an open-source machine-checked formal model and soundness proof, the Verified Software Toolchain (VST), formalized in the Coq proof assistant. The VST components include the theory of separation logic for reasoning about pointer-manipulating programs; indirection theory for reasoning with “step-indexing” about first-class function pointers, recursive types, recursive functions, dynamic mutual-exclusion locks, and other higher-order programming; a Hoare logic (separation logic) with full reasoning about control-flow and data-flow of the C programming language; theories of concurrency for reasoning about programming models such as Pthreads; theories of compiler correctness for connecting to the CompCert verified C compiler; theories of symbolic execution for implementing foundationally verified static analyses.
Synopsis: Specification of the interface between CompCert and its clients such as the VST separation logic for C light, or clients such as proved-sound static analyses and abstract interpretations. This specification takes the form of an operational semantics with a nontrivial memory model. The need to preserve the compiler's freedom to optimize the placement of data (in memory, out of memory) requires the ability to rename addresses and adjust block sizes. Thus the specification of shared-memory interaction between subprograms (separately compiled functions, or concurrent threads) requires particular care, to keep these renamings consistent.
A static analysis is an algorithm that checks (or calculates) invariants of a program based on its syntactic (static) structure, in contrast to a dynamic analysis which observes properties of actual program executions. Static analysis can tell us properties of all possible executions, while dynamic analysis can only observe executions on particular inputs.
A sound static analysis is one with a proof that any invariants checked by the analysis will actually hold on all executions. A foundationally sound analysis is one where the soundness proof is (ideally) machine-checked, (ideally) with respect to the machine-language instruction-set architecture specification—not the source language—and (ideally) with no axioms other than the foundations of logic and the ISA specification.
Some of the first foundationally sound static analyses were proof-carrying code systems of the early 21st century [5, 45, 35, 3]. It was considered impractical (at that time) to prove the correctness of compilers, so these proof-carrying systems transformed source-language typechecking (or Hoare logic [14]) phase by phase through the compilation, into an assembly-language Hoare logic.
With the existence of foundationally correct compilers such as CompCert, instead of proof-carrying code we can prove the soundness of a static analysis from the source-language semantics, and compose that proof with the compiler-correctness proof. See for example the value analysis using abstract interpretation by Blazy et al. [22]
Some kinds of static analysis may be easier to prove sound with respect to a program logic than directly from the operational semantics.
The imperative programming paradigm views programs as sequences of commands that update a memory state. A memory model specifies memory states and operations such as reads and writes. Such a memory model is a prerequisite to giving formal semantics to imperative programming languages, verifying properties of programs, and proving the correctness of program transformations.
For high-level, type-safe languages such as ML or the sequential fragment of Java, the memory model is simple and amounts to a finite map from abstract memory locations to the values they contain. At the other end of the complexity spectrum, we find memory models for shared-memory concurrent programs with data races and relaxed (non sequentially consistent) memory, where much effort is needed to capture the relaxations (e.g. reorderings of reads and writes) that are allowed and those that are guaranteed never to happen [1].
For CompCert we focus on memory models for the C language and for compiler intermediate languages, in the sequential case and with extensions to data race-free concurrency. C and our intermediate languages feature both low-level aspects such as pointers, pointer arithmetic, and nested objects, and high-level aspects such as separation and freshness guarantees. For instance, pointer arithmetic can result in aliasing or partial overlap between the memory areas referenced by two pointers; yet, it is guaranteed that the memory areas corresponding to two distinct variables or two successive calls to malloc are disjoint.
Dijkstra presented semaphore-based mutual exclusion as an extension to a sequential language [37]. Posix threads present Dijkstra-Hoare concurrency as an extension to a sequential language [55]. O'Hearn presented concurrent separation logic (CSL) as an extension to separation logic, in which all the rules of sequential separation logic still hold [71].
Can we really model concurrency as an extension to sequentiality? Boehm explains why it is very tricky to explain shared-memory concurrency as an extension to a sequential language [24]. But we have taken great care to specify our language's external-interaction model (Chapter 33), in order to do this soundly.
Therefore we do something ambitious: we present the semantic model of CSL, for the C language, in the presence of an optimizing compiler and weak cache coherency, as a modular extension to our semantic model for sequential separation logic. This chapter is based on Aquinas Hobor's PhD thesis [49, 51] and on current work by Gordon Stewart.
Concurrent separation logic with first-class locks. O'Hearn's presentation of CSL had several limitations, most importantly a lack of first-class locks (locks that can be created/destroyed dynamically, and in particular can be used to control access to other locks). Hobor et al. [51] and Gotsman et al. [44] independently extended CSL to handle first-class locks as well as a number of other features.
Chapter 30 explains our CSL with first-class locks.