PhD Abstracts

In this thesis, I present a declarative framework for parsing that explicitly integrates data elaboration. Within the framework of the thesis, I present parsing algorithms that are based on the concept of Brzozowski’s derivatives. Derivative-based parsing algorithms present several advantages: they are elegant, amenable to formal reasoning, and easy to implement. Unfortunately, the performance of these algorithms in practice is often not competitive with other approaches. In this thesis, I show a general technique inspired by Huet’s Zipper to greatly enhance the performance of derivative-based algorithms, and I do so without compromising their elegance, amenability to formal reasoning, or ease of implementation. First, I present a technique for building efﬁcient tokenisers that is based on Brzozowski’s derivatives and Huet’s zipper and that does not require the usual burdensome explicit conversion to automata. I prove the technique is correct in Coq and present SILEX, a Scala lexing library based on the technique. I demonstrate that the approach is competitive with state-of-the-art solutions. Then, I present a characterisation of LL(1) languages based on the concept of should-not-follow sets. I present an algorithm for parsing LL(1) languages with derivatives and zippers. I show a formal proof of the algorithm’s correctness and prove its worst-case linear-time complexity. I show how the LL(1) parsing with derivatives and zippers algorithm corresponds to the traditional LL(1) parsing algorithm. I then present SCALL1ON, a Scala parsing combinators library for LL(1) languages that incorporates the LL(1) parsing with derivatives and zippers algorithm. I present an expressive and familiar combinator-based interface for describing LL(1) languages. I present techniques that help precisely locate LL(1) conﬂicts in user code. I discuss several advantages of the parsing with derivatives approach within the context of a parsing library. I also present SCALL1ON’s enumeration and pretty-printing features and discuss their implementation. Through a series of benchmarks, I demonstrate the good performance and practicality of the approach. Finally, I present how to adapt the LL(1) parsing with derivatives and zippers algorithm to support arbitrary context-free languages. I show how the adapted algorithm corresponds to general parsing algorithms, such as Earley’s parsing algorithm.

Ad hoc data is everywhere.There are many data formats in use, which, due to the specificity of their domain, lack the processing tools that we otherwise take for granted.We call such data formats, and the data that they represent, ad hoc.Ad hoc data is usually stored in file systems and can be organized into larger structures that we call filestores, collections of files and folders along with the properties between them.This dissertation supports the usage of ad hoc filestores.We design domain-specific languages for processing filestores with increasingly complex requirements, building on previous work on PADS and Forest.These existing systems accept declarative specifications of ad hoc data formats, as single files and filestores respectively.From these specifications, they generate tools for loading, storing, and validating the data.Unfortunately, Forest does not adequately deal with large filestores, or cost control in general.Nor does it offer support for correctly managing concurrent operations, which are common in file systems.This dissertation offers solutions to these problems.We first introduce Incremental Forest, a domain-specific language and system that enables incremental processing of filestores.This language offers a new mechanism, a delay construct, for explicitly controlling the costs of loading and storing in filestores.Incremental Forest comes with a customizable cost model, which guarantees that a wide class of costs monotonically decrease as delays increase.Our next system, Transactional Forest, removes the delays from Incremental Forest, opting to use a new interface language and abstraction, which offer automatic incrementality.Additionally, Transactional Forest leverages this abstraction, a zipper, to provide simple, provably correct serializable transactions using optimistic concurrency control.Finally, the Zipper File System goes beyond designing a domain-specific language, targeting the file system itself to provide deeper control with respect to other users.The Zipper File System uses the ideas from Transactional Forest to provide serializable transactions in a zipper-based file system.There is a translation from POSIX that theoretically allows standard applications to be run without modification.Taken together, these systems use domain-specific languages to enable users to efficiently and correctly manage ad hoc filestores in concurrent settings.

ROMAIN EDELMANN École Polytechnique Fédérale de Lausanne, Switzerland
Date: July 2021; Advisor: Viktor Kunȏak URL: https://tinyurl.com/y5dtdb6f In this thesis, I present a declarative framework for parsing that explicitly integrates data elaboration.Within the framework of the thesis, I present parsing algorithms that are based on the concept of Brzozowski's derivatives.Derivative-based parsing algorithms present several advantages: they are elegant, amenable to formal reasoning, and easy to implement.Unfortunately, the performance of these algorithms in practice is often not competitive with other approaches.In this thesis, I show a general technique inspired by Huet's Zipper to greatly enhance the performance of derivative-based algorithms, and I do so without compromising their elegance, amenability to formal reasoning, or ease of implementation.First, I present a technique for building efficient tokenisers that is based on Brzozowski's derivatives and Huet's zipper and that does not require the usual burdensome explicit conversion to automata.I prove the technique is correct in Coq and present SILEX, a Scala lexing library based on the technique.I demonstrate that the approach is competitive with state-of-the-art solutions.Then, I present a characterisation of LL(1) languages based on the concept of should-not-follow sets.I present an algorithm for parsing LL(1) languages with derivatives and zippers.I show a formal proof of the algorithm's correctness and prove its worst-case linear-time complexity.I show how the LL(1) parsing with derivatives and zippers algorithm corresponds to the traditional LL(1) parsing algorithm.I then present SCALL1ON, a Scala parsing combinators library for LL(1) languages that incorporates the LL(1) parsing with derivatives and zippers algorithm.I present an expressive and familiar combinator-based interface for describing LL(1) languages.I present techniques that help precisely locate LL(1) conflicts in user code.I discuss several advantages of the parsing with derivatives approach within the context of a parsing library.I also present SCALL1ON's enumeration and pretty-printing features and discuss their implementation.Through a series of benchmarks, I demonstrate the good performance and practicality of the approach.Finally, I present how to adapt the LL(1) parsing with derivatives and zippers algorithm to support arbitrary context-free languages.I show how the adapted algorithm corresponds to general parsing algorithms, such as Earley's parsing algorithm.

Proving Confidentiality and Its Preservation Under Compilation for Mixed-Sensitivity Concurrent Programs
ROBERT ABELLA SISON University of New South Wales, Australia Date: November 2020; Advisor: Carroll Morgan, Toby Murray and Kai Engelhardt URL: https://tinyurl.com/2ucyjp4k Here, I pose the thesis that proving noninterference and its preservation by a compiler is feasible for mixed-sensitivity concurrent programs.Software does not always have the luxury of limiting itself to single-threaded computation with resources statically dedicated to each user to ensure the confidentiality of their data.Prior work therefore presented formal methods for proving and preserving the strictest kind of confidentiality property, noninterference, for mixed-sensitivity concurrent programs: a term I coin to describe those programs that might reuse memory shared between their threads to hold data of different sensitivity levels at different times.Although these methods addressed challenges in formalising the value-dependent coordination of such mixed-sensitivity reuse under the impact of concurrency, their practicality remained unclear: Could they be used to prove noninterference for any nontrivial mixed-sensitivity concurrent program in its entirety?Furthermore, could any compiler be verified to preserve the needed guarantees to the compiled code?
To support this claim, I prove for the first time both (1) noninterference for a nontrivial mixed-sensitivity concurrent program, modelling a real-world use case, and (2) its preservation by a compiler down to an assembly-level model.This main result rests on two major contributions.First, I demonstrate how programming-language designers can make reasoning on each thread sufficient to prove noninterference for such programs, by supplying synchronisation primitives (here, mutex locks for a generic imperative language) and proving they maintain as invariant the necessary requirements.Second, I demonstrate how compiler developers can make confidentiality-preserving refinement a feasible target for verification, by using a decomposition principle to prove that a compiler (here, from that imperative language to a generic RISC-style assembly language) establishes it for mixedsensitivity concurrent programs.Thus, per-thread reasoning proves noninterference for the case study, and the verified compiler preserves it to assembly automatically.All my results are formalised and proved in the Isabelle/HOL interactive proof assistant.
My work paves the way for more fully featured programming languages and their compilers, in replicating these results, to raise the typical level of assurance readily offered by developers of multithreaded software responsible for data of multiple sensitivity levels.