To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A portion of an Office of Naval Research (ONR) database was used to test the TEXT system. The portion used contains information about military vehicles and weapons. The ONR database was selected for TEXT in part because of its availability (it had been in use previously in a research project jointly with the Wharton School of the University of Pennsylvania) and in part because of its complex structure. Even using only a portion of the database provided a domain complex enough to allow for an interesting set of questions and answers.
As discussed in Chapter One, TEXT accepts three kinds of questions as input. These are:
What is a <e>?
What do you know about <e>?
What is the difference between <e1> and <e2>?
where <e>, <e1>, and <e2> represent any entity in the database. Since the TEXT system does not include a facility for interpreting English questions, the user must phrase his questions in the functional notation shown below which corresponds to the three classes of questions.
(definition <e>))
(information <e>)
(differense <e1> <e2>)27
Note that the system only handles questions about objects in the database. Although the system can include information about relations when relevant to a question about a particular object, it can not answer questions about relations themselves.
System components
The TEXT system consists of six major components: a schema selector, a relevant knowledge selector, the schema filler, the focusing mechanism, a dictionary interface, and a tactical component.
The approach I have taken towards text generation is based on two fundamental hypotheses about the production of text: 1) that how information is stored in memory and how a person describes that information need not be the same and 2) that people have preconceived notions about the ways in which descriptions can be achieved.
I assume that information is not described in exactly the same way it is organized in memory. Rather, such descriptions reflect one or more principles of text organization. It is not uncommon for a person to repeat himself and talk about the same thing on different occasions. Rarely, however, will he repeat himself exactly. He may describe aspects of the subject which he omitted on first telling or he may, on the other hand, describe things from a different perspective, giving the text a new emphasis. Chafe (79) has performed a series of experiments which he claims support the notion that the speaker decides as he is talking what material should go into a sentence. These experiments show that the distribution of semantic constituents among sentences often varies significantly from one version of a narrative to another.
The second hypothesis central to this research is that people have preconceived ideas about the means with which particular communicative tasks can be achieved as well as the ways in which these means can be integrated to form a text. In other words, people generally follow standard patterns of discourse structure. For example, they commonly begin a narrative by describing the setting (the scene, the characters, or the time-frame).
In the process of producing discourse, speakers and writers must decide what it is that they want to say and how to present it effectively. They are capable of disregarding information in their large body of knowledge about the world which is not specific to the task at hand and they manage to integrate pertinent information into a coherent unit. They determine how to appropriately start the discourse, how to order its elements, and how to close it. These decisions are all part of the process of deciding what to say and when to say it. Speakers and writers must also determine what words to use and how to group them into sentences. In order for a system to generate text, it, too, must be able to make these kinds of decisions.
In this work, a computational solution is sought to the problems of deciding what to say and how to organize it effectively. What principles of discourse can be applied to this task? How can they be specified so that they can be used in a computational process? A computational perspective can aid our understanding of how discourse is produced by demanding a precise specification of the process. If we want to build a system that can perform these tasks, our theory of production must be detailed and accurate. Conversely, to build a system that can produce discourse effectively, determining content and textual shape, the development and application of principles of discourse structure, discourse coherency, and relevancy criterion are essential to its success.
The TEXT system was implemented in CMU lisp (an extension of Franz Lisp) on a VAX 11/780. The TEXT system source code occupies a total of 1176 K of memory with the following breakdown:
Knowledge base and accessing functions (not including database and database interface functions): 442K
Strategic component: 573K
Tactical component: 145K
The system, including the knowledge base, was loaded in entirety into memory for use of the TEXT system. Only the database remains on disk. No space problems were encountered during implementation with one exception: the particular Lisp implementation available does not allow for resetting the size of the recursive name stack. This meant that certain functions which were originally written recursively had to be rewritten iteratively since the name stack was not large enough to handle them.
Processing speed is another question altogether. Currently the response time of the TEXT system is far from being acceptable for practical use. The bulk of the processing time, however, is used by the tactical component. Since it was not the focal point of this dissertation, no major effort was made to speed up this component. To answer a typical question posed to the TEXT system, the strategic component (including dictionary interface) uses 3290 CPU seconds, an elapsed time of approximately one and a half minutes, while the tactical component uses 43845 CPU seconds, an elapsed time of approximately 20 minutes. Times vary for different questions. These statistics were obtained when using the system in a shared environment. An improvement in speed could be achieved by using a dedicated system. It should be noted, furthermore, that the strategic component is not compiled, while the tactical component is.
Tracking the discourse history involves remembering what has been said in a single session with a user and using that information when generating additional responses. The discourse history can be used to avoid repetition within a single session and, more importantly, to provide responses that contrast with previous answers. Although the maintenance of a discourse history record was not implemented in the TEXT system, an analysis of the effects such a history could have on succeeding questions as well as the information that needs to be recorded in order to achieve those effects was made. In the following sections some examples from each class of questions that the system handles are examined to show how they would be affected by the various kinds of discourse history records that could be maintained.
Possible discourse history records
Several different discourse history types, each containing a different amount of information, are possible. One history type could simply note that a particular question was asked and an answer provided by maintaining a list of questions On the other hand, the system could record both the question asked and the actual answer provided in its history. The answer itself could be maintained in any of a number of ways. The history could record the structure and information content of the answer (for TEXT, this would be the instantiated schema). Another possibility would be to record some representation of the surface form of the answer, whether its syntactic structure or the actual text.
Interest in the generation of natural language is beginning to grow as more systems are developed which require the capability for sophisticated communication with their users. This chapter provides an overview of the development of research in natural language generation. Other areas of research, such as linguistic research on discourse structure, are also relevant to this work, but are overviewed in the pertinent chapters.
The earliest generation systems relied on the use of stored text and templates to communicate with the user. The use of stored text requires the system designer to enumerate all questions the system must be able to answer and write out the answers to these questions by hand so that they can be stored as a whole and retrieved when needed. Templates allow a little more flexibility. Templates are English phrases constructed by the designer with slots which can be instantiated with different words and phrases depending upon the context. Templates may be combined and instantiated in a variety of ways to produce different answers. One main problem with templates is that the juxtaposition of complete English phrases frequently results in awkward or illegal text. A considerable amount of time must be spent by the designer experimenting with different combinations to avoid this problem.
Both of these methods require a significant amount of hand-encoding, are limited to handling anticipated questions, and cannot be extended in any significant way. They are useful, however, for situations in which a very limited range of generation is required particularly because the system can be as eloquently spoken as the designer.
The paragraphs from the first topic group in the Introduction to Working (Terkel 72) are reproduced below.
INTRODUCTION
This book, being about work, is, by its very nature, about violence – to the spirit as well as to the body. It is about ulcers as well as accidents, about shouting matches as well as fistfights, about nervous breakdowns as well as kicking the dog around. It is, above all (or beneath all), about daily humiliations. To survive the day is triumph enough for the walking wounded among the great many of us.
The scars, psychic as well as physical, brought home to the supper table and the TV set, may have touched, malignantly, the soul of our society. More or less. (“More or less,” that most ambiguous of phrases, pervades many of the conversations that comprise this book, reflecting, perhaps, an ambiguity of attitude toward The Job. Something more than Orwellian acceptance, something less than Luddite sabotage. Often the two impulses are fused in the same person.)
It is about a search, too, for daily meaning as well as daily bread, for recognition as well as cash, for astonishment rather than torpor; in short, for a sort of life rather than a Monday through Friday sort of dying. Perhaps immortality, too, is part of the quest. To be remembered was the wish, spoken and unspoken, of the heroes and heroines of this book.
In Chapter 2 we considered a very simple data structure, the linked-linear list; and in Chapter 3 we moved on to binary trees. In this chapter we look at two much more general structures.
Firstly we shall consider trees in which nodes may have more than two branches, and in which the number of branches may vary from node to node. For want of a better name we shall call them n-ary trees.
Secondly we shall consider even more general structures which arise when more than one branch leads into a node. These structures are called directed graphs. Clearly they are more general than n-ary trees, which, therefore, may be regarded as a special case.
B-trees
We consider first the n-ary tree, and, in this section, its use in searching applications. Such trees are usually called B-trees, a convention we shall follow.
When we discussed binary trees in Chapter 2 we noted that searching, insertion and deletion were all O(log n), provided that the tree remained balanced. Although we did not discuss the topic of balance in much detail there, we referred the reader to a number of relevant techniques. B-trees arise in this connection too, though here we shall approach them from a different point of view.
Let us imagine first of all that we have a sequence of variable-length items in the store with an item with an infinite key placed at the end.
Recursion is the Cinderella of programming techniques where languages such as Pascal are concerned. All primers mention it, of course, but generally devote only a few pages to it. Rohl and Barrett's Programming via Pascal is one of the more generous: it contains one chapter of 12 pages on the subject!
Books appropriate to second courses in programming, such as those by Wirth (1976), Alagic & Arbib (1978), and the more modern data structures texts, have helped considerably; but currently there is no book devoted to the use of recursion in Pascal or similar languages.
And yet this used not to be the case: Barron's delightful little book Recursive Techniques in Programming was published in 1968! Sadly it is now out of print, and in any event was beginning to show its age. Recursion via Pascal is the author's attempt to fill this gap.
Of course, in functional programming, recursion has received its full due, since it is quite often the only repetitive construct, and this area is fairly well served with text-books. In Recursion via Pascal, most of the examples are procedures rather than functions, partly because that is the usual Pascal style and partly because we want to give examples which actually do something, like drawing the cover motif of this series, instead of merely having a value. Reading one of the functional texts after finishing this book would provide an alternative perspective.