We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Knowledge representation and reasoning are central to all fields of Artificial Intelligence research. It includes the development of formalisms for the representation of given subject matters as well as the development of inference procedures to reason about the represented knowledge. Before developing a knowledge representation formalism, one must determine what type of knowledge has to be modeled with the formalism. Since a lot of our knowledge of the world can easily be described using natural language, it is an interesting task to examine to what extent the contents of natural language utterances can be formalized and represented with a given representation formalism. Every approach to represent natural language utterances must include a method to formalize aspects of the meaning of single lexical units.
An early attempt in this direction was Quillian's Semantic Memory (Quillian, 1968), an associational model of human memory. A semantic memory consists of nodes corresponding to English words and different associative links connecting the nodes. Based on that approach, various knowledge representation systems have been developed which can be subsumed under the term semantic network. Common to all these systems is that knowledge is represented by a network of nodes and links. The nodes usually represent concepts or meanings whereas the links represent relations between concepts. In most semantic network formalisms, a special kind of link between more specific and more general concepts exists. This link, often called IS-A or AKO (a kind of), organizes the concepts into a hierarchy in which information can be inherited from more general to more specific concepts.
The connections between the study of natural language semantics, especially lexical semantics, and knowledge representation are manifold. One of the reasons why lexical semantics holds this place is obvious when one looks at compositional denotational theories of meaning. Here, one tries to account for the meaning of expressions in terms of a relation between linguistic expressions and the world. The dictionary makes explicit what part of the world each basic item refers to, whereas the grammar rules are associated with general instructions to combine the meanings of parts into the meanings of wholes. Most natural language understanding systems cannot relegate the interpretation function of basic items (as contained in the dictionary) to some mysterious interpretation function, say 'as in the case of Montague semantics, but have to be more explicit about the world and the substantive relation between basic expressions and the assumed ontology. Actual explanatory dictionaries can be viewed as stating complex relationships between natural language expressions. This perspective focuses on the fact that definitions are stated in language. The other perspective is focused on what is described: one could also say that a definition is the representation of a constraint on the world/model or the specification of the (now less mysterious) interpretation function for basic expressions.
Although it may not be realistic to argue that knowledge representation problems can be totally equated with problems of lexical semantics, there is enough reason to take notice of the latter when dealing with the former. Certainly this is the case where one deals with knowledge representation for natural language understanding. Within this general perspective we take the following position.
One of the major challenges today is coping with an overabundance of potentially important information. With newspapers such as the Wall Street Journal available electronically as a large text database, the analysis of natural language texts for the purpose of information retrieval has found renewed interest. Knowledge extraction and knowledge detection in large text databases are challenging problems, most recently under investigation in the TIPSTER projects funded by DARPA, the U.S. Department of Defense research funding agency. Traditionally, the parameters in the task of information retrieval are the style of analysis (statistical or linguistic), the domain of interest (TIPSTER, for instance, focuses on news concerning micro-chip design and joint ventures), the task (filling database entries, question answering, etc.), and the representation formalism (templates, Horn clauses, KL-ONE, etc.).
It is the premise of this chapter that much more detailed information can be gleaned from a careful linguistic analysis than from a statistical analysis. Moreover, a successful linguistic analysis provides more reliable data, as we hope to illustrate here. The problem is, however, that linguistic analysis is very costly and that systems that perform complete, reliable analysis of newspaper articles do not currently exist.
The challenge then is to find ways to do linguistic analysis when it is possible and to the extent that it is feasible. We claim that a promising approach is to perform a careful linguistic preprocessing of the texts, representing linguistically encoded information in a task independent, faithful, and reusable representation scheme.
The next part in this book concerns linguistic issues for lexical semantics. The main issues addressed are based on the notion of Generative Lexicon (Pustejovsky, 1991) and its consequences for the construction of lexicons.
The first chapter, “Linguistic constraints on type coercion,” by James Pustejovsky, summarizes the foundations of the Generative Lexicon which he defined a few years ago. This text investigates how best to characterize the formal mechanisms and the linguistic data necessary to explain the behavior of logical polysemy. A comprehensive range of polymorphic behaviors that account for the variations in semantic expressiveness found in natural languages is studied.
Within the same formal linguistic paradigm, we then have a contribution by Sabine Bergler, “From lexical semantics to text analysis,” which illustrates several issues of the Generative Lexicon using data from the Wall Street Journal. This chapter addresses in depth an important issue of the Generative Lexicon: what kind of methods can be used to create Generative Lexicon lexical entries with precise semantic content for the Qualia roles of the Generative Lexicon, from linguistic analysis. Special attention is devoted to the production of partial representations and to incremental analysis of texts.
The next chapter, “Lexical functions, generative lexicons and the world” by Dirk Heylen, explores the convergences and divergences between Mel'čuk's analysis of lexical functions and the generative lexicon approach. The author then proposes an interesting and original knowledge representation method based on lexical functions mainly following Mel'čuk's approach.
In this chapter, we present a synopsis of several notions of psycholinguistics and linguistics that are relevant to the field of lexical semantics. We mainly focus on the notions or theoretical approaches that are broadly used and admitted in computational linguistics. Lexical semantics is now playing a central role in computational linguistics, besides grammar formalisms for parsing and generation, and sentence and discourse semantic representation production. The central role of lexical semantics in computational linguistics can be explained by the fact that lexical entries contain a considerable part of the information that is related to the word-sense they represent.
This introduction will provide the reader with some basic concepts in the field of lexical semantics and should also be considered as a guide to the chapters included in this book. We first present some basic concepts of psycholinguistics which have some interest for natural language processing. We then focus on the linguistic aspects which are commonly admitted to contribute substantially to the field. It has not, however, been possible to include all aspects of lexical semantics: the absence of certain approaches should not be considered as an a priori judgment on their value.
The first part of this text introduces psycholinguistic notions of interest to lexical semantics; we then present linguistic notions more in depth. At the end of this chapter, we review the chapters in this volume.
Contribution of psycholinguistics to the study of word meaning
Results from psycholinguistic research can give us a good idea of how concepts are organized in memory, and how this information is accessed in the mental lexicon.
Many words have two or more very distinct meanings. For example, the word pen can refer to a writing implement or to an enclosure. Many natural language applications, including information retrieval, content analysis and automatic abstracting, and machine translation, require the resolution of lexical ambiguity for words in an input text, or are significantly improved when this can be accomplished. That is, the preferred input to these applications is a text in which each word is “tagged” with the sense of that word intended in the particular context. However, at present there is no reliable way to automatically identify the correct sense of a word in running text. This task, called word sense disambiguation, is especially difficult for texts from unconstrained domains because the number of ambiguous words is potentially very large. The magnitude of the problem can be reduced by considering only very gross sense distinctions (e.g., between the pen-as-implement and pen-as-enclosure senses of pen, rather than between finer sense distinctions within, say, the category of pen-as-enclosure – i.e., enclosure for animals, enclosure for submarines, etc.), which is sufficient for many applications. But even so, substantial ambiguity remains: for example, even the relatively small lexicon (20,000 entries) of the TRUMP system, which includes only gross sense distinctions, finds an average of about four senses for each word in sentences from the Wall Street Journal (McRoy, 1992). The resulting combinatoric explosion demonstrates the magnitude of the lexical ambiguity problem.
Several different kinds of information can contribute to the resolution of lexical ambiguity.
Lexical choice cannot be processed in text generation without appealing to a lexicon which takes into account many lexico-semantic relations. The text generation system must be able to treat the immediate and the larger lexical context.
a) The immediate lexical context consists of the lexemes that surround the lexical item to be generated. This context must absolutely be taken into account in the case of collocational constraints, which restrict ways of expressing a precise meaning to certain lexical items, for example as in expressions like pay attention, receive attention or narrow escape (Wanner & Bateman, 1990; Iordanskaja et al., 1991; Nirenburg & Nirenburg, 1988; Heid & Raab, 1989).
b) The larger textual context consists of the linguistic content of previous and subsequent clauses. This context is the source for cohesive links (Halliday & Hasan, 1976) with the lexical items to be generated in the current clause, as in:
(1) Professor Elmuck was lecturing on lexical functions to third-year students. The lecturer was interesting and the audience was very attentive.
In the second sentence, lecturer is coreferential with Professor Elmuck and the audience is coreferential with third-year students. These semantic links are due to the lexico-semantic relations between, on the one hand, lecturer (“agent noun”) and lecture, and on the other hand, between audience (“patient noun”) and lecture.
In this chapter, we will show that the Lexical Functions (LFs) of the Explanatory Combinatorial Dictionary (hereafter ECD) (Mel'čuk et al., 1984a, 1988; Mel'čuk & Polguère, 1987; Meyer & Steele, 1990) are well suited for these tasks in text generation.
In order to help characterize the expressive power of natural languages in terms of semantic expressiveness, it is natural to think in terms of semantic systems with increasing functional power. Furthermore, a natural way of capturing this might be in terms of the type system which the grammar refers to for its interpretation. What I would like to discuss in this chapter is a method for describing how semantic systems fall on a hierarchy of increasing expressiveness and richness of descriptive power and investigate various phenomena in natural language that indicate (1) that we need a certain amount of expressiveness that we have not considered before in our semantics, but also (2) that by looking at the data it becomes clear that we need natural constraints on the mechanisms which give us such expressive systems. After reviewing the range of semantic types from monomorphic languages to unrestricted polymorphic languages, I would like to argue that we should aim for a model which permits only a restricted amount of polymorphic behavior. I will characterize this class of languages as semi-polymorphic.
I will outline what kind of semantic system produces just this class of languages. I will argue that something like the generative lexicon framework is necessary to capture the richness of type shifting and sense extension phenomena.
Let me begin the discussion on expressiveness by reviewing how this same issue was played out in the realm of syntactic frameworks in the 1950s.
The work described in this chapter starts from the observation that a word in a text has a semantic value which is seldom identical with any of the definitions found in a dictionary. This fact was of little importance as long as dictionaries were primarily intended for human beings, since the process used to convert the lexical meaning into the semantic value seems well mastered by humans – at least by the potential users of dictionaries – but it becomes of prominent importance now that we need computer-oriented dictionaries.
As a matter of fact, the computation of the semantic value for a given word requires lexical information about that word, about the other words of the text, about syntax, and about the world. The set of the possible values for a given word X is open-ended, i.e., a list of values being given, it is always possible to build some context where X takes a value not present in the list. As a consequence, no dictionary – as thick as it could be – may contain all of them. Therefore, in any case, it is necessary to implement a mechanism which constructs the value from information taken in the dictionary, as well as from knowledge of the grammar and of the world.
In artificial intelligence (henceforth A.I.), the main objective is not to find the “correct” meaning of each word or phrase, but to get the set of consequences which can be drawn from a text; if the same set is obtained from “different” interpretations (i.e., interpretations using different values for some word), then the difference is irrelevant.
The first topic, psycholinguistic and cognitive aspects of lexical semantics, is addressed in the first two chapters. This area is particularly active but relatively ignored in computational circles. The reasons might be a lack of precise methods and formalisms. This area is, however, crucial for the construction of well-designed semantic lexicons by the empirical psychologically based analysis introduced in the domain of computational lexical semantics.
“Polysemy and related phenomena from a cognitive linguistic viewpoint” by Alan Cruse surveys the ways in which the contribution of the same grammatical word makes to the meaning of a larger unit depending on the context. Two main explanatory hypotheses that account for contextual variation are explored: lexical semantics and pragmatics. Alternatives to this approach are studied in the other parts of the volume.
The second chapter, “Mental lexicon and machine lexicon: Which properties are shared by machine and mental word representations? Which are not?” by J.-F. Le Ny, sets language comprehension and interpretation within a general cognitive science perspective. Properties common to natural and artificial semantic units (e.g., denotation, super-ordination, case roles, etc.) are first explored. Then, problems related to activability and accessibility in the memory are addressed in both a theoretical and experimental way.
This chapter describes a number of features that might be useful in practical work with qualified types. We adopt a less rigourous approach than in previous chapters and we do not attempt to deal with all of the technical issues that are involved.
Section 6.1 suggests a number of techniques that can be used to reduce the size of the predicate set in the types calculated by the type inference algorithm, resulting in smaller types that are often easier to understand. As a further benefit, the number of evidence parameters in the translation of an overloaded term may also be reduced, leading to a potentially more efficient implementation.
Section 6.2 shows how the use of information about satisfiability of predicate sets may be used to infer more accurate typings for some terms and reject others for which suitable evidence values cannot be produced.
Finally, Section 6.3 discusses the possibility of adding the rule of subsumption to the type system of OML to allow the use of implicit coercions from one type to another within a given term.
It would also be useful to consider the task of extending the language of OML terms with constructs that correspond more closely to concrete programming languages such as recursion, groups of local binding and the use of explicit type signatures. One example where these features have been dealt with is in the proposed static semantics for Haskell given in (Peyton Jones and Wadler, 1992) but, for reasons of space, we do not consider this here.
This chapter describes an ML-like language (i.e. implicitly typed λ-calculus with local definitions) and extends the framework of (Milner, 1978; Damas and Milner, 1982) with support for overloading using qualified types and an arbitrary system of predicates of the form described in the previous chapter. The resulting system retains the flexibility of the ML type system, while allowing more accurate descriptions of the types of objects. Furthermore, we show that this approach is suitable for use in a language based on type inference, in contrast for example with more powerful languages such as the polymorphic λ-calculus that require explicit type annotations.
Section 3.1 introduces the basic type system and Section 3.2 describes an ordering on types, used to determine when one type is more general than another. This is used to investigate the properties of polymorphic types in the system.
The development of a type inference algorithm is complicated by the fact that there are many ways in which the typing rules in our original system can be applied to a single term, and it is not clear which of these (if any!) will result in an optimal typing. As an intermediate step, Section 3.3 describes a syntax-directed system in which the choice of typing rules is completely determined by the syntactic structure of the term involved, and investigates its relationship to the original system. Exploiting this relationship, Section 3.4 presents a type inference algorithm for the syntax-directed system which can then be used to infer typings in the original system.
One of the main goals in preparing this book for publication was to preserve the thesis, as much as possible, in the form that it was originally submitted. With this in mind, we have restricted ourselves to making only very minor changes to the body of the thesis, for example, correcting typographical errors.
On the other hand, we have continued to work with the ideas presented here, to find new applications and to investigate some of the areas identified as topics for further research. In this short chapter, we domment briefly on some examples of this, illustrating both the progress that has been made and some of the new opportunities for further work that have been exposed.
We should emphasize once again that this is the only chapter that was not included as part of the original thesis.
Constructor classes
The initial ideas for a system of constructor classes as sketched in Section 9.2 have been developed in (Jones, 1993b), and full support for these ideas is now included in the standard Gofer distribution (versions 2.28 and later). The two main technical extensions in the system of constructor classes to the work described here are:
The use of kind inference to determine suitable kinds for all the user-defined type constructors appearing in a given program.
The extension of the unification algorithm to ensure that it calculates only kind-preserving substitutions. This is necessary to ensure soundness and is dealt with by ensuring that constructor variables are only ever bound to constructors of the corresponding kind. Fortunately, this has a very simple and efficient implementation.
While the results of the preceding chapter provide a satisfactory treatment of type inference with qualified types, we have not yet made any attempt to discuss the semantics or evaluation of overloaded terms. For example, given a generic equality operator (==) of type ∀a.Eq a ⇒ a → a → Bool and integer valued expressions E and F, we can determine that the expression E == F has type Bool in any environment which satisfies Eq Int. However, this information is not sufficient to determine the value of E == F; this is only possible if we are also provided with the value of the equality operator which makes Int an instance of Eq.
Our aim in the next two chapters is to present a general approach to the semantics and implementation of objects with qualified types based on the concept of evidence. The essential idea is that an object of type π ⇒ σ can only be used if we are also supplied with suitable evidence that the predicate π does indeed hold. In this chapter we concentrate on the role of evidence for the systems of predicates described in Chapter 2 and then, in the following chapter, extend the results of Chapter 3 to give a semantics for OML.
As an introduction, Section 4.1 describes some simple techniques used in the implementation of particular forms of overloading and shows why these methods are unsuitable for the more general systems considered in this thesis.