To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Parallel computation has been a physical reality for about two decades, and a major topic for research for at least half that time. However, given the whole spectrum of applications of computers, almost nothing is actually computed in parallel. In this chapter we suggest reasons why there is a great gap between the promise of parallel computing and the delivery of real parallel computations, and what can be done to close the gap.
The basic problem in parallel computation, we suggest, is the mismatch between the requirements of parallel software and the properties of the parallel computers on which it is executed. The gap between parallel software and hardware is a rapidly changing one because the lifespans of parallel architectures are measured in years, while a desirable lifespan for parallel software is measured in decades. The current standard way of dealing with this mismatch is to reengineer software every few years as each new parallel computer comes along. This is expensive, and as a result parallelism has only been heavily used in applications where other considerations outweigh the expense. This is mostly why the existing parallel processing community is so heavily oriented towards scientific and numerical applications – they are either funded by research institutions and are pushing the limits of what can be computed in finite time and space, or they are working on applications where performance is the only significant goal.
Chapter 2 covers desirable model properties and shows how categorical data types satisfy these properties. I developed this view of model properties during 1992 (an early survey of models using them appeared as [179]) and extended and refined it over the next two years. A preliminary version of Chapter 4 was given as a talk at the Workshop on Programming Tools for Parallel Machines at Alimini, Italy, in the summer of 1993.
Chapter 3 is based on Valiant's work which can be found in a series of papers [197,199, 200]. The results on emulation on SIMD architectures appears in [178], although they seem to have been widely understood before that.
The construction of lists as a categorical data type in Chapter 5 follows the general presentation in Grant Malcolm's thesis [138]. An alternative presentation that emphasises the role of adjunctions in the CDT construction is due to Mike Spivey [189]. Much of the category theory on which this work depends was done in the Sixties [25]. The demonstration that lists can be efficiently implemented comes from [178].
The material on software development in Chapter 6 is a selection from a much larger range of material developed in what has become known as the Bird-Meertens formalism [17,31–35]. The material on almost-homomorphisms (and the name) come from work by Murray Cole [55].
The development of operations to compute recurrences (Chapter 7) and the cost calculus for lists (Chapter 8) is joint work by myself and Wentong Cai during 1992, when he was a postdoctoral fellow.
In Chapter 2 we listed some of the requirements that a model for general-purpose parallel computation should satisfy: architecture independence, intellectual abstractness, having a software development methodology, having cost measures, having no preferred scale of granularity, and being efficiently implementable. In Chapter 3, we saw how results about emulation of arbitrary computations on architectures mean that efficient implementability can only be achieved by restricting the communication allowed in computations. In this chapter, we examine existing models and parallel programming languages and see how they measure up to these requirements.
Many of these models were not developed with such an ambitious set of requirements in mind, so it is not a criticism of them if they fail to meet some. Nevertheless, it provides a picture of the current situation. The absence of a popular or standard model, even in particular application domains, and the wide range of models that have been proposed, underline some of the difficulties of using parallelism in a general-purpose way that were discussed in Chapter 2. It is not possible to cover all proposed models for parallel computation, but a representative selection has been included.
There are several other good surveys of programming models from different perspectives. Crooks and Perrott [59] survey models for distributed-memory architectures, particularly those that provide abstractions for data partitioning and distribution. Turcotte [195] surveys models suitable for networks of workstations. Bal et al. [20] survey architecture-specific models.
The distinctions between models and programming languages are not easy to make.
In this chapter, we explore, in more detail, the software development methodology that is used with CDTs. It is a methodology based on transformation. Many of the transformations that are useful for list programming were already known informally in the Lisp community, and more formally in the APL and functional programming community. The chief contributions of the categorical data type perspective are:
a guarantee that the set of transformation rules is complete (which becomes important for more complex types); and
a style of developing programs that is terse but expressive.
This style has been extensively developed by Bird and Meertens, and by groups at Oxford, Amsterdam, and Eindhoven. A discussion of many of the stylistic and notational issues, and a comparison of the Bird–Meertens approach with Eindhoven quantifier notation, can be found in [17]. Developments in the Bird-Meertens style are an important interest of IFIP Working Group 2.1.
An Integrated Software Development Methodology
A software development methodology must handle specifications that are abstract, large, and complex. The categorical data type approach we have been advocating plays only a limited role in such a methodology because it is restricted (at the moment) to a single data type at a time. Although it is useful for handling the interface to parallel architectures, it is too limited, by itself, to provide the power and flexibility needed for large application development.
We have shown how to build categorical data types for the simple type of concatenation lists. In this chapter we show the data type construction in its most general setting. While there is some overhead to understanding the construction in this more general setting, the generality is needed to build much more complex types. We illustrate this in subsequent chapters by building types such as trees, arrays, and graphs.
More category theory background is assumed in this chapter. Suitable references are [127,158].
Categorical Data Type Construction
The construction of a categorical data type is divided into four stages:
The choice of an underlying category of basic types and computations on them. This is usually the category Type, but other possibilities will certainly be of interest.
The choice of an endofunctor, T, on this underlying category. The functor is chosen so that its effect on the still-hypothetical constructed type is to unpack it into its components. Components are chosen by considering the type signatures of constructors that seem suitable for the desired type. When this endofunctor is polynomial it has a fixed point; and this fixed point is defined to be the constructed type.
The construction of a category of T-algebras, T-AIg, whose objects are algebras of the new type and their algebraic relatives, and whose arrows are homomorphisms on the new type. The constructed type algebra (the free type algebra) is the initial object in this category. The unique arrows from it to other algebras are catamorphisms.
The central theme of this book is that the structure of a computation on a data type reflects the structure of the data type. This is true in two senses:
Any homomorphism on a data type is intimately related to the algebraic structure of its codomain; which can be exploited in the search for programs, and
The evaluation of any homomorphism can follow the structure of its argument; which can be exploited in computing programs.
Structured data types and the homomorphisms on them, called catamorphisms, form a programming model for parallel computation that has many attractive properties.
There is a desperate need for a model of parallel computation that can decouple software from hardware. This decoupling occurs in two dimensions: decoupling the rate of change of parallel hardware (high) from that of parallel software (low, if it is to be economic); and decoupling the variety of parallel hardware from a single, architecture-independent version of the software.
Such a model is hard to find because the requirements are mutually in tension. A model must be opaque enough to hide target architectures and the complexity of parallel execution, while providing a semantic framework that is rich enough to allow software development. At the same time, it must be partly translucent so that the costs of programs can be visible during development, to allow intelligent choices between algorithms.
In this chapter we build a much more complex type, the type of arrays. The construction of arrays as a categorical data type is significantly different and more complex than the constructions seen so far, so this chapter illustrates new aspects of the construction technique.
Arrays are an important type because of the ubiquitous use of Cartesian coordinate systems and the huge edifice of linear algebra built on top of them. Almost all scientific and numeric computations require arrays as central data structures.
While the need for a data type of arrays is undisputed, there has always been some disagreement about exactly what arrays should represent: should the entries of an array all be of the same type (homogeneous) or might they be different (inhomogeneous); should extents be the same in each dimension (rectangular) or might they differ (ragged); are arrays of different sizes members of the same type or of different types. Different programming languages have answered these questions differently.
Arrays appeared early in languages such as Fortran, which had homogeneous, rectangular arrays but was ambivalent about how they should be typed. Arrays had to be declared with their shapes, and shapes of arguments and parameters had to agree (although some of these rules were relaxed later). Fortran even went so far as to reveal the storage allocation of arrays (by columns) at the language level.
In this chapter, we explore the constraints imposed on models by the properties of parallel architectures. We are only concerned, of course, about theoretical properties, because we cannot predict technological properties very far into the future. Recent foundational results, particularly by Valiant [200], show that arbitrary parallel programs can be emulated efficiently on certain classes of parallel architectures, but that inefficiencies are unavoidable on others. Thus a model of parallel computation that expresses arbitrary computations cannot be efficiently implementable over the full range of parallel architecture classes. The difficulty lies primarily in the volume of communication that takes place during computations. Thus we are driven to choose between two quite different approaches to designing models: accepting some inefficiency, or restricting communication in some way.
Parallel Architectures
We consider four architecture classes:
shared-memory MIMD architectures, consisting of processors executing independently, but communicating through a shared memory, visible to them all;
distributed-memory MIMD architectures, consisting of processors executing independently, each with its own memory, and communicating using an interconnection network whose capacity grows as p log p, where p is the number of processors;
distributed-memory MIMD architectures, consisting of processors executing independently, each with its own memory, and communicating using an interconnection network whose capacity grows only linearly with the number of processors (that is, the number of communication links per processor is constant);
SIMD architectures, consisting of a single instruction stream, broadcast to a set of data processors whose memory organisation is either shared or distributed.
So far we have discussed the properties that a model of parallel computation ought to have and have claimed that models built from categorical data types have these properties. In this chapter we show how to build a simple but useful categorical data type, the type of join or concatenation lists, and illustrate its use as a model. We show how such a model satisfies the requirements, although some of the details are postponed to later chapters.
The language we construct for programming with lists is not different from other parallel list languages in major ways in the sense that most of the list operations are familiar maps, reductions, and prefixes. The differences are in the infrastructure that comes from the categorical data type construction: an equational transformation system, a deeper view of what operations on lists are, and a style of program development. When we develop more complex types, the construction suggests new operations that are not obvious from first principles.
For the next few chapters we concentrate on aspects of parallel computation on lists. We describe the categorical data type construction in more detail in Chapter 9 and move on to more complex types. The next few sections explain how to build lists in a categorical setting. They may be skipped by those who are not interested in the construction itself. The results of the construction and its implications are summarised in Section 5.5.
We have already discussed why a set of cost measures is important for a model of parallel computation. In this chapter we develop something stronger, a cost calculus. A cost calculus integrates cost information with equational rules, so that it becomes possible to decide the direction in which an equational substitution is cost-reducing. Unfortunately, a perfect cost calculus is not possible for any parallel programming system, so some compromises are necessary. It turns out that the simplicity of the mapping problem for lists, thanks to the standard topology, is just enough to permit a workable solution.
Cost Systems and Their Properties
Ways of measuring the cost of a partially developed program are critical to making informed decisions during the development. An ideal cost system has the following two properties:
It is compositional, so that the cost of a program depends in some straightforward way on the cost of its pieces. This is a difficult requirement in a parallel setting since it amounts to saying that the cost of a program piece depends only on its internal structure and behaviour and not on its context. However, parallel operations have to be concerned about the external properties of how their arguments and results are mapped to processors since there are costs associated with rearranging them. So, for parallel computing, contexts are critically important.
It is related to the calculational transformation system, so that the cost of a transformation can be associated with its rule.
In this chapter we define the categorical data types of graphs. Graphs are ubiquitous in computation, but they are subtly difficult to work with. This is partly because there are many divergent representations for graphs and it is hard to see past the representations to the essential properties of the data type.
We follow the now familiar strategy of defining constructors and building graphs as the limit of the resulting polynomial functor.
Graphs have a long history as data structures. Several specialised graph languages have been built (see, for example, [69]), but they all manipulate graphs using operations that alter single vertices and edges, rather than the monolithic operations we have been advocating.
An important approach to manipulating graphs is graph grammars. Graph grammars [71] are analogues of grammars for formal languages and build graphs by giving a set of production rules. Each left hand side denotes a graph template, while each right hand side denotes a replacement. The application of a rule occurs on a subgraph that matches the left hand side of some rule. The matching subgraph is removed from the graph (really a graph form) and replaced by the right hand side of the rule. Various different conventions are used to add edges connecting the new graph to the rest of the original graph. Graph grammars are primarily used for capturing structural information about the construction and transformation of graphs. They do not directly give rise to computations on graphs.