To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The simulation of the ideal parallel computation model, pram, on the hypercube network is considered in this chapter. It is shown that a class of pram programs can be simulated with a slowdown of O(log N) with almost certainty and without using hashing, where N denotes the number of processors. Also shown is that general pram programs can be simulated with a slowdown of O(log N) with the help of hashing. Both schemes are ida-based and fault-tolerant.
Introduction
Parallel algorithms are notoriously hard to write and debug. Hence, it is only natural to turn to ideal models that provide good abstraction. As these models do not assume any particular hardware configuration, they should have the additional benefit that programs written for them can be executed on any hardware that supports the model, similar to the situation in the sequential case where the existence of a, say, C compiler on a particular platform implies standard C programs can be compiled and executed there [370]. The PRAM (“Parallel Random Access Machine”) is one such model. It completely abstracts out the cost issue in communication and allows us to focus on the computational aspect. However, such convenience and generality is not without its price: the PRAM model, unlike the von Neumann machine model, is not physically feasible to build [369]. The simulation of PRAMs by feasible computers is therefore important and forms the major theme of this chapter.
After briefly examining the challenges facing designing ever more powerful computers, we discuss the important issues in parallel processing and outline solutions. An overview of the book is also given.
The von Neumann Machine Paradigm
The past five decades have witnessed the birth of the first electronic computer [257] and the rapid growth of the computing industry to exceed $1,000 billion (annual revenue) in the U.S. alone [162]. The demand for high-performance machines is further powered by the advent of many crucial problems whose solutions require enormous computing power: environmental issues, search for cures for diseases, accurate and timely weather forecasting, to mention just a few [271]. Moreover, although unre-lenting decrease in the feature size continues to improve the computing capability per chip, turning that into a corresponding increase in computing performance is a major challenge [152, 316]. All these factors point toward the necessity of sustained innovation in the design of computers.
That the von Neumann machine paradigm [37], the conceptual framework for most computers, considered at the system level will impede further performance gains is not hard to see. Input and output excluded, the von Neumann machine conceptually consists of a processing unit, a memory storing both programs and data, and a wire that connects the two.
In this final chapter, we briefly review techniques and concepts in fault-tolerant computing. Then we sketch the design of a fault-tolerant parallel computer, hpc (“hypercube parallel computer”), based on the results and ideas from previous chapters.
Introduction
A fault-free computer, or any human artifact, has never been built, and will never be. No matter how reliable each component is, there is always possibility, however small, that it will go wrong. Statistical principles dictate that, other things being equal, this possibility increases as the number of components increases. Such an event, if not anticipated and safe-guarded against, will eventually make the computer malfunction and lead to anything from small annoyance and inconvenience to disaster.
Recently, the same enormous decrease in hardware cost which makes parallel computers economically feasible also makes fault tolerance more affordable [297]. In other words, the low cost of hardware makes possible both high degree of fault tolerance using redundancy and high performance. Indeed, most fault-tolerant computers today employ multiple processors; see [241, 254, 317] for good surveys.
It is in the light of these backgrounds that we take this extra step toward designing a hypercube parallel computer (HPC for short). In the HPC processors are grouped into logical clusters consisting of physically close processors, and each program execution is replicated at all members of a cluster. Clusters overlap, however. The concept of cluster — logical or physical — introduces a two-level, instead of flat, organization and can be found in, for example, the Cm [344], Cedar [187], and FTPP (“Fault Tolerant Parallel Processor”) [148] computers.
Humans are the most versatile of creatures, and computers are their most versatile of creations. Human–Computer Interaction (HCI) is the study of what they do together; in particular, HCI aims to make interaction better suit the humans. Computers contribute to art, science, engineering, … all areas of human endeavor. It is no surprise, then, that there is heated debate about what the essence of HCI is and what it should be. What is good HCI? The answer to this question will be elusive given that there is good engineering that is not art, good art that is not science, and good science that is not engineering.
It's easier to see what form of answer there can be by taking a quick excursion into another field. Imagine the discovery of a dye, such as W. H. Perkin's breakthrough discovery of mauve. Is it science? Yes: certain chemicals must react to produce the dyestuff, and the principles of chemistry suggest other possibilities. Is it art? Yes: it makes an attractive color. Is it engineering? Yes: its quantity production, fastness in materials, and so forth, are engineering. Perkin's work made the once royal purple accessible to all. Fortunately there is no subject “Human Chemical Interaction” to slide us into thinking that there is, or should be, one right view of the work of making or using, designing, standardizing, or evaluating a dye. Nevertheless, we appreciate a readily available, stunning color, used by an able artist, and one that lasts without deteriorating.
Schemes for activity reuse are based upon the assumption that the human–computer dialog has many recurring activities. Yet there is almost no empirical evidence confirming the existence of these recurrences or suggestions of how observed patterns of recurrences in one dialog would generalize to other dialogs. The next few chapters address this dearth. They provide empirical evidence that people not only repeat their activities, but that they do so in quite regular ways. This chapter starts with the general notion of recurrent systems, where most users predominantly repeat their previous activities. Such systems suggest potential for activity reuse because there is opportunity to give preferential treatment to the large number of repeated actions. A few suspected recurrent systems from both non-computer and computer domains are examined in this context to help pinpoint salient features. Particular attention is paid to repetition of activities in telephone use, information retrieval in technical manuals, and command lines in UNIX. The following chapters further examine UNIX as a recurrent system, and then generalize the results obtained into a set of design properties.
A definition of recurrent systems
An activity is loosely defined as the formulation and execution of one or more actions whose result is expected to gratify the user's immediate intention. It is the unit entered into incremental interaction systems (as defined in Section 1.2.1) (Thimbleby, 1990). Entering command lines, querying databases, and locating and selecting items in a menu hierarchy are some examples.
This chapter serves two purposes: review of previous work and preview of our new results on the parallel routing problem. Our general approach to the problem is also outlined. The routing schemes and their analysis will appear in the next chapter.
Introduction
Perhaps the most important issue in parallel computers is interprocessor communication. Since processors are linked together by a relatively sparse network due to cost considerations, packets have to traverse many links and even be delayed by other packets due to conflicts or full buffers before reaching their destinations. Communication time, therefore, can easily dominate the total execution time, making designing fast communication schemes a primary concern. It is also preferable that the buffer size be a constant independent of the size of the network for the sake of scalability. Finally, as more components mean more faults [326], other things being equal, the issue of fault tolerance without loss of efficiency should be dealt with together.
We review previous work and preview our approach and results in this chapter. In the next chapter, it will be shown how the above-mentioned problems can be tackled simultaneously using information dispersal.
Fast, fault-tolerant communication schemes using constant size buffers on the hypercube and the de Bruijn networks are presented. All our schemes run with probability of successful routing 1 – N−Θ(logN) where N denotes the number of processors. A summary of this chapter's results can be found in Section 4.4.
Parallel Communication Scheme
The notion of symmetric parallel communication scheme (SPCS) captures the essence, and facilitates the analysis, of our parallel routing algorithms.
Definition 5.1In anepoched communication scheme (ECS), packets are sent inepochs, numbered from 0. In any epoch, packets that demand transmission to neighboring nodes in that epoch, as determined by the routing algorithm, are sent along the links as requested. A symmetric parallel communication scheme (SPCS) is an ECS that satisfies the following conditions.
Each node initially has h packets.
All packets are routed independently by the routing algorithm; that is, in any epoch, a packet crosses an out-going link independently with equal probability.
The expected number of packets at each node is h at the end of each epoch.
Such a scheme is called an h-SPCS.
Comment 5.2 Condition 3 implies that no packets can be lost. In all our schemes, however, pieces (that is, packets in the definition of ECS) can be lost due to buffer overflow. This is not a serious problem since we can analyze a scheme as if pieces over the capacity of the buffer were not lost.
If I send a man to buy a horse for me, I expect him to tell me that horse's points – not how many hairs he has in his tail.
— Carl Sandburg's Abraham Lincoln
This final chapter will be brief. First, the argument of the book is reviewed. Next, the original contributions are identified. Finally, new directions for research are sketched. The individual components of the book are not evaluated or criticized because this has been done at the end of each chapter.
Argument of the book
We began with the observation that orders given to interactive computer systems resemble tools used by people. Like tools, orders are employed to pursue activities that shape one's environment and the objects it contains. People have two general strategies for keeping track of the diverse tools they wield in their physical workshops. Recently used tools are kept available for reuse, and tools are organized into functional and task-oriented collections. Surprisingly, these strategies have not been transferred effectively to interactive systems.
This raises the possibility of an interactive support facility that allows people to use, reuse, and organize their on-line activities. The chief difficulty with this enterprise is the dearth of knowledge of how users behave when giving orders to general-purpose computer systems. As a consequence, existing user support facilities are based on ad hoc designs that do not adequately support a person's natural and intuitive way of working.
A portion of a trace belonging to a randomly selected expert programmer follows in the next few pages. The nine login sessions shown cover slightly over one month of the user's UNIX interactions, and include 155 command lines in total.
As mentioned in Chapter 2, all trace records have been made publicly available through a research report and an accompanying magnetic tape (Greenberg, 1988b). This report may be obtained from the Department of Computer Science, University of Calgary, or the author.
Because the raw data collected is not easily read, it was syntactically transformed to the listing presented here. The number and starting time of each login session are marked in italics. The first column shows the lines processed by csh after history expansions were made. The current working directory is given in the middle column. Blank entries indicate that the directory has not changed since the previous command line, and the “∼” is csh shorthand for the user's home directory. The final column lists any extra annotations recorded. These include alias expansions of the line by csh, error messages given to the user, and whether history was used to enter the line. Long alias expansions are shown truncated and suffixed with “…”.
The living clockwork of the State must be repaired
while it is in motion, and here it is a case of
changing the wheels as they revolve.
—Friedrich Schiller
We demonstrate in this chapter that, if fsra is used, a constant fraction of the wires in the hypercube network can be disabled simultaneously without disrupting the ongoing computation or degrading the routing performance. This general result can lead to efficient on-line maintenance procedures. This seems to be the first time that the important issue of on-line maintenance is addressed analytically.
Introduction
The fact that hardware deteriorates and the demand that machine be more available to the user make the property of on-line maintenance without performance penalty a desirable design goal. For example, in the Tandem/16 computer system [173], modular design allows some components to be replaced on-line. Periodic maintenance of the hardware is also key to ensuring consistent system performance; without it, one cannot safely say a particular component retains roughly the same failure rate at different times.
In this chapter we address the issue of on-line wire maintenance on the hypercube network with FSRA as the routing algorithm. It is shown that the set of edges can be partitioned into a constant number, 352, of disjoint edge sets of roughly equal sizes such that the probability of unsuccessful routing is exponentially small if any edge set in the proposed partition is disabled (Theorem 8.3). That implies little performance penalty as re-routing is extremely unlikely. The partition is also easily and locally computable.
We survey interconnection networks that link the processors in a parallel computer. Important terms and design issues pertaining to networks are also briefly discussed. This chapter is intended to be concise and sufficiently complete.
Introduction
The interconnection network in a multiprocessor specifies how the processors are tied together. The processors then communicate by sending information through the network. Since the interprocessor delay easily dominates the execution time [56] except where interprocessor communications are rare or only nearly processors exchange information, the choice of the network makes the difference between an efficient system and an inefficient one; clearly, a network where information has to traverse, say, 100 links on the average is less efficient than the one where only ten suffice, other things being equal. Besides the efficiency issue, the interconnection network also sets a limit to the number of faults a parallel computing system can sustain. For example, since a disconnected network that isolates some processors from the others makes joint computation impossible, the network should be able to withstand many faulty links.
This chapter surveys interconnection networks and related issues. In Section 3.2 some of the basic terms for networks are reviewed. A simple and useful graph- theoretical abstraction of networks is introduced in Section 3.3. Several popular networks are then briefly surveyed in Section 3.4, followed by Section 3.5, where some key issues in the choice of networks are summarized.
This chapter examines how people use commands in command-based systems. Like previous work, it is based on an analysis of long-term records of user–computer interaction with the UNIX csh command interpreter, collected as described in the previous chapter. The results of the major studies are reevaluated, particularly those of Hanson, Kraut, and Farber (1984), and Draper (1984), and some of the work is replicated. Although the statistical results of the studies are supported, some of the conclusions made by the original researchers are found to be misleading.
The following sections provide details of how people direct command-based systems in terms of how individual commands are selected and the dependencies between these commands. It is essential to take into account the fact that pooled statistics may conceal important differences between individuals. As a consequence, the results are analyzed by user and by identifying groups of similar users, as well as by pooling data for the entire population.
For the current study, a command is the first word entered in the command line. Those lines that produced system errors were not considered. The first word is parsed by removing all white space at the beginning of the line and counting all characters up to but not including the next white space or end of line. For example, the command parsed from the command line
print −f 31 −t 40 galley.text
is “print.” The parsed word is almost always a true UNIX command or alias that invokes a program or shell script.
This chapter introduces a study of natural everyday human usage of the UNIX operating system and its command line interface. Analysis of the data collected is central to the pursuit of knowledge of user behavior when interacting with generalpurpose environments. The chapter begins by describing UNIX and gives reasons why it is an appropriate vehicle for research. Section 2.2 reviews several methods of data collection used with previous UNIX investigations, and Section 2.3 describes the details of the current study. Analyses of data are deferred to later chapters.
Choosing UNIX
Why perform natural studies on UNIX, with its baroque and outdated user interface, instead of controlled experiments on a modern system? This section starts by advocating a natural study for exploratory investigation of human–computer interaction. After recognizing several pragmatic problems with such investigations, UNIX is introduced and its choice is justified.
Natural studies
The thrust of the work presented in this book is that it is possible to capitalize on patterns evident in human–computer interaction by building special user support tools. A prerequisite is to “know the user” (Hansen, 1971). One way to accomplish this goal is through analyzing everyday natural user interactions with current systems so that existing patterns of activity can be discovered and exploited. Hanson, Kraut, and Farber (1984) justify this approach by contrast with traditional controlled experimentation.
Although [a controlled experiment is] appropriate and useful in theory-guided research … it is less appropriate when the researcher needs to identify new variables or complex unknown relations between new variables. […]