To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The basic elements of most recent Natural Language Understanding (NLU) systems are a syntactic parser which is used to determine sentence structure, and a semantic lexicon whose purpose is to access the system's factual knowledge from the natural language input. In this regard, the semantic lexicon plays the key role of relating words to world knowledge. But the semantic lexicon is also used in solving some specifically linguistic issues when recovering sentence structure, and should contain linguistic knowledge. In this chapter we discuss the issue of lexical content in terms of linguistic and world knowledge, through the so-called dictionary–encyclopedia controversy. To illustrate the discussion we will describe the lexical semantics approach adopted in our NLU program processing sentences from medical records (Zweigenbaum and Cavazza, 1990). This program is a small-scale but fully implemented prototype adopting a broad view to NLU, from syntactic analysis to complex domain inferences through model-based reasoning (Grishman and Ksiezyk, 1990). The dictionary–encyclopedia controversy opposes two extreme conceptions of word definitions: according to the dictionary approach a word is described in terms of linguistic elements only, without recourse to world knowledge, whereas an encyclopedic definition includes
an indication of the different species or different stages of the object or process denoted by the word, the main types of behavior of this object or process,… (Mel'čuk and Zholkovsky, 1988).
This point has been discussed by many authors including Katz and Fodor (1963), Eco (1984), Wierzbicka (1985) and Taylor (1989).
Recent works in Computational Linguistics show the central role played by the lexicon in language processing, and in particular by the lexical semantics component. Lexicons tend no longer to be a mere enumeration of feature-value pairs but tend to have an intelligent behavior. This is the case, for example, for generative lexicons (Pustejovsky, 1991) which contain, besides feature structures, a number of rules to create new (partial) definitions of word-senses such as rules for conflation and type coercion. As a result, the size of lexical entries describing word-senses has substantially increased. These lexical entries become very hard to be used directly by a natural language parser or generator because their size and complexity allow a priori little flexibility.
Most natural language systems consider a lexical entry as an indivisible whole which is percolated up in the parse/generation tree. Access to features and feature values at grammar rule level is realized by more or less complex procedures (Shieber, 1986; Johnson, 1990; Günthner, 1988). The complexity of real natural language processing systems makes such an approach very inefficient and not necessarily linguistically adequate. In this document, we propose a dynamic treatment of features in grammars, embedded within a Constraint Logic Programming framework (noted as CLP hereafter) which permits us to access a feature-value pair associated to a certain word-sense directly into the lexicon, and only when this feature is explicitly required by the grammar, for example, to make a check. More precisely, this dynamic treatment of features will make use of both constraint propagation techniques embedded within CLP and CLP resolution mechanisms.
Lexical semantics knowledge often requires various kinds of inferences to be made in order to make more precise and more explicit the meaning of a word with respect to the context in which it is uttered. The treatment of inheritance is particularly crucial from this point of view since linguistic as well as conceptual hierarchies allow us to relate a word with more generic terms that contribute, in a certain sense, to the definition of the meaning of that word. Because hierarchies are complex, inheritance often involves non-monotonic forms of reasoning. Another relation with artificial intelligence is the development of models and techniques for the automatic acquisition of linguistic knowledge.
In “Blocking”, Ted Briscoe et al. elaborate on the introduction of default inheritance mechanisms into theories of lexical organization. Blocking is a kind of inference rule that realizes a choice when two kinds of inheritances can be drawn from resources occurring at different levels in the lexical organization. For example, a general morphological rule adds -ed at the end of verbs for the past construction. A subclass of verbs consists of irregular verbs, which receive a particular mark for the past. A default rule states that the exceptional mark is preferred to the general case. However there exist verbs such as dream, which accept both flections (dreamt, dreamed). Then the default rule is no longer adequate and a more flexible solution is required. In this chapter, a framework based on a non-monotonic logic that incorporates more powerful mechanisms for resolving conflicts of various kinds is introduced.
Various theories are nowadays used in linguistic analysis, particularly in syntax, and it does not seem reasonable to expect a reduction of their number in the near future. Nor does it seem reasonable to expect that one theory will cover them, as a kind of meta-theory. Nevertheless, all these theories have in common the need for a lexicon which would include the necessary and sufficient information for combining lexical items and extracting a representation of the meaning of such combinations.
If it is not possible to propose a canonical theory to organize the storage of the lexical information, it is necessary to adopt a “polytheoretic” conception of the lexicon. (Hellwig, Minkwitz, and Koch, 1991). A point shared by different theories concerns the need for some semantic information even for a syntactic parsing of a sentence (and a fortiori, beyond this level, for the parsing of a text).
During a first period, the semantic information required was above all concentrated on the thematic roles (Gruber, 1965; Jackendoff, 1972). These thematic roles were introduced because the grammatical functions were insufficient for discriminating between various interpretations and for describing the similarities of sentences. For instance, in:
The door opened
Charlie opened the door
the door is considered as having the same semantic function but not the same grammatical function – subject for (1) and object for (2). Gruber (1965) gives to the door in the two sentences the same thematic role of theme. A similar objective was assigned to the case-theory (Fillmore, 1968).
LexLog is not a lexical theory, but a package of logical specifications for constructing explicit representations for lexical items in restricted information domains. We usually call a domain “restricted” in two main cases.
The domain is notionally bounded. It is sufficiently simple to be decomposed into an organized finite set of notions, which might be enriched, if necessary, by applying a finite number of explicit combination rules. Metaphorically, one could say that the domain has been “axiomatized”. A clear example of this situation can be found in Palmer (1990).
The domain is lexically bounded. The representations to be constructed apply to a finite set of lexical items, and not to the whole of a language, or to an indefinite set of terms.
While they are often associated, neither of these properties strictly implies the other. For example, although the use of a statistical tool like PATHFINDER (Schvaneveldt, 1990) allows the construction of limited lexical clusters (case 2), their notional homogeneity is not warranted by the clustering procedure. E.g., in a study of the word marriage based on the definitions in the Longman Dictionary of Contemporary English (Wilks et al., 1989), the words marriage and muslim co-occur. Although the religious link is obvious, making precise the amount of knowledge about muslim which would be relevant to marriage is not easy. LexLog is designed only for situations which exhibit the TWO forms of boundedness, that is, it can be of some help only when the domain allows the use of a finite notional and a finite lexical basis.
Knowledge representation and reasoning are central to all fields of Artificial Intelligence research. It includes the development of formalisms for the representation of given subject matters as well as the development of inference procedures to reason about the represented knowledge. Before developing a knowledge representation formalism, one must determine what type of knowledge has to be modeled with the formalism. Since a lot of our knowledge of the world can easily be described using natural language, it is an interesting task to examine to what extent the contents of natural language utterances can be formalized and represented with a given representation formalism. Every approach to represent natural language utterances must include a method to formalize aspects of the meaning of single lexical units.
An early attempt in this direction was Quillian's Semantic Memory (Quillian, 1968), an associational model of human memory. A semantic memory consists of nodes corresponding to English words and different associative links connecting the nodes. Based on that approach, various knowledge representation systems have been developed which can be subsumed under the term semantic network. Common to all these systems is that knowledge is represented by a network of nodes and links. The nodes usually represent concepts or meanings whereas the links represent relations between concepts. In most semantic network formalisms, a special kind of link between more specific and more general concepts exists. This link, often called IS-A or AKO (a kind of), organizes the concepts into a hierarchy in which information can be inherited from more general to more specific concepts.
The connections between the study of natural language semantics, especially lexical semantics, and knowledge representation are manifold. One of the reasons why lexical semantics holds this place is obvious when one looks at compositional denotational theories of meaning. Here, one tries to account for the meaning of expressions in terms of a relation between linguistic expressions and the world. The dictionary makes explicit what part of the world each basic item refers to, whereas the grammar rules are associated with general instructions to combine the meanings of parts into the meanings of wholes. Most natural language understanding systems cannot relegate the interpretation function of basic items (as contained in the dictionary) to some mysterious interpretation function, say 'as in the case of Montague semantics, but have to be more explicit about the world and the substantive relation between basic expressions and the assumed ontology. Actual explanatory dictionaries can be viewed as stating complex relationships between natural language expressions. This perspective focuses on the fact that definitions are stated in language. The other perspective is focused on what is described: one could also say that a definition is the representation of a constraint on the world/model or the specification of the (now less mysterious) interpretation function for basic expressions.
Although it may not be realistic to argue that knowledge representation problems can be totally equated with problems of lexical semantics, there is enough reason to take notice of the latter when dealing with the former. Certainly this is the case where one deals with knowledge representation for natural language understanding. Within this general perspective we take the following position.
One of the major challenges today is coping with an overabundance of potentially important information. With newspapers such as the Wall Street Journal available electronically as a large text database, the analysis of natural language texts for the purpose of information retrieval has found renewed interest. Knowledge extraction and knowledge detection in large text databases are challenging problems, most recently under investigation in the TIPSTER projects funded by DARPA, the U.S. Department of Defense research funding agency. Traditionally, the parameters in the task of information retrieval are the style of analysis (statistical or linguistic), the domain of interest (TIPSTER, for instance, focuses on news concerning micro-chip design and joint ventures), the task (filling database entries, question answering, etc.), and the representation formalism (templates, Horn clauses, KL-ONE, etc.).
It is the premise of this chapter that much more detailed information can be gleaned from a careful linguistic analysis than from a statistical analysis. Moreover, a successful linguistic analysis provides more reliable data, as we hope to illustrate here. The problem is, however, that linguistic analysis is very costly and that systems that perform complete, reliable analysis of newspaper articles do not currently exist.
The challenge then is to find ways to do linguistic analysis when it is possible and to the extent that it is feasible. We claim that a promising approach is to perform a careful linguistic preprocessing of the texts, representing linguistically encoded information in a task independent, faithful, and reusable representation scheme.
The next part in this book concerns linguistic issues for lexical semantics. The main issues addressed are based on the notion of Generative Lexicon (Pustejovsky, 1991) and its consequences for the construction of lexicons.
The first chapter, “Linguistic constraints on type coercion,” by James Pustejovsky, summarizes the foundations of the Generative Lexicon which he defined a few years ago. This text investigates how best to characterize the formal mechanisms and the linguistic data necessary to explain the behavior of logical polysemy. A comprehensive range of polymorphic behaviors that account for the variations in semantic expressiveness found in natural languages is studied.
Within the same formal linguistic paradigm, we then have a contribution by Sabine Bergler, “From lexical semantics to text analysis,” which illustrates several issues of the Generative Lexicon using data from the Wall Street Journal. This chapter addresses in depth an important issue of the Generative Lexicon: what kind of methods can be used to create Generative Lexicon lexical entries with precise semantic content for the Qualia roles of the Generative Lexicon, from linguistic analysis. Special attention is devoted to the production of partial representations and to incremental analysis of texts.
The next chapter, “Lexical functions, generative lexicons and the world” by Dirk Heylen, explores the convergences and divergences between Mel'čuk's analysis of lexical functions and the generative lexicon approach. The author then proposes an interesting and original knowledge representation method based on lexical functions mainly following Mel'čuk's approach.
In this chapter, we present a synopsis of several notions of psycholinguistics and linguistics that are relevant to the field of lexical semantics. We mainly focus on the notions or theoretical approaches that are broadly used and admitted in computational linguistics. Lexical semantics is now playing a central role in computational linguistics, besides grammar formalisms for parsing and generation, and sentence and discourse semantic representation production. The central role of lexical semantics in computational linguistics can be explained by the fact that lexical entries contain a considerable part of the information that is related to the word-sense they represent.
This introduction will provide the reader with some basic concepts in the field of lexical semantics and should also be considered as a guide to the chapters included in this book. We first present some basic concepts of psycholinguistics which have some interest for natural language processing. We then focus on the linguistic aspects which are commonly admitted to contribute substantially to the field. It has not, however, been possible to include all aspects of lexical semantics: the absence of certain approaches should not be considered as an a priori judgment on their value.
The first part of this text introduces psycholinguistic notions of interest to lexical semantics; we then present linguistic notions more in depth. At the end of this chapter, we review the chapters in this volume.
Contribution of psycholinguistics to the study of word meaning
Results from psycholinguistic research can give us a good idea of how concepts are organized in memory, and how this information is accessed in the mental lexicon.
Many words have two or more very distinct meanings. For example, the word pen can refer to a writing implement or to an enclosure. Many natural language applications, including information retrieval, content analysis and automatic abstracting, and machine translation, require the resolution of lexical ambiguity for words in an input text, or are significantly improved when this can be accomplished. That is, the preferred input to these applications is a text in which each word is “tagged” with the sense of that word intended in the particular context. However, at present there is no reliable way to automatically identify the correct sense of a word in running text. This task, called word sense disambiguation, is especially difficult for texts from unconstrained domains because the number of ambiguous words is potentially very large. The magnitude of the problem can be reduced by considering only very gross sense distinctions (e.g., between the pen-as-implement and pen-as-enclosure senses of pen, rather than between finer sense distinctions within, say, the category of pen-as-enclosure – i.e., enclosure for animals, enclosure for submarines, etc.), which is sufficient for many applications. But even so, substantial ambiguity remains: for example, even the relatively small lexicon (20,000 entries) of the TRUMP system, which includes only gross sense distinctions, finds an average of about four senses for each word in sentences from the Wall Street Journal (McRoy, 1992). The resulting combinatoric explosion demonstrates the magnitude of the lexical ambiguity problem.
Several different kinds of information can contribute to the resolution of lexical ambiguity.
Lexical choice cannot be processed in text generation without appealing to a lexicon which takes into account many lexico-semantic relations. The text generation system must be able to treat the immediate and the larger lexical context.
a) The immediate lexical context consists of the lexemes that surround the lexical item to be generated. This context must absolutely be taken into account in the case of collocational constraints, which restrict ways of expressing a precise meaning to certain lexical items, for example as in expressions like pay attention, receive attention or narrow escape (Wanner & Bateman, 1990; Iordanskaja et al., 1991; Nirenburg & Nirenburg, 1988; Heid & Raab, 1989).
b) The larger textual context consists of the linguistic content of previous and subsequent clauses. This context is the source for cohesive links (Halliday & Hasan, 1976) with the lexical items to be generated in the current clause, as in:
(1) Professor Elmuck was lecturing on lexical functions to third-year students. The lecturer was interesting and the audience was very attentive.
In the second sentence, lecturer is coreferential with Professor Elmuck and the audience is coreferential with third-year students. These semantic links are due to the lexico-semantic relations between, on the one hand, lecturer (“agent noun”) and lecture, and on the other hand, between audience (“patient noun”) and lecture.
In this chapter, we will show that the Lexical Functions (LFs) of the Explanatory Combinatorial Dictionary (hereafter ECD) (Mel'čuk et al., 1984a, 1988; Mel'čuk & Polguère, 1987; Meyer & Steele, 1990) are well suited for these tasks in text generation.
In order to help characterize the expressive power of natural languages in terms of semantic expressiveness, it is natural to think in terms of semantic systems with increasing functional power. Furthermore, a natural way of capturing this might be in terms of the type system which the grammar refers to for its interpretation. What I would like to discuss in this chapter is a method for describing how semantic systems fall on a hierarchy of increasing expressiveness and richness of descriptive power and investigate various phenomena in natural language that indicate (1) that we need a certain amount of expressiveness that we have not considered before in our semantics, but also (2) that by looking at the data it becomes clear that we need natural constraints on the mechanisms which give us such expressive systems. After reviewing the range of semantic types from monomorphic languages to unrestricted polymorphic languages, I would like to argue that we should aim for a model which permits only a restricted amount of polymorphic behavior. I will characterize this class of languages as semi-polymorphic.
I will outline what kind of semantic system produces just this class of languages. I will argue that something like the generative lexicon framework is necessary to capture the richness of type shifting and sense extension phenomena.
Let me begin the discussion on expressiveness by reviewing how this same issue was played out in the realm of syntactic frameworks in the 1950s.
The work described in this chapter starts from the observation that a word in a text has a semantic value which is seldom identical with any of the definitions found in a dictionary. This fact was of little importance as long as dictionaries were primarily intended for human beings, since the process used to convert the lexical meaning into the semantic value seems well mastered by humans – at least by the potential users of dictionaries – but it becomes of prominent importance now that we need computer-oriented dictionaries.
As a matter of fact, the computation of the semantic value for a given word requires lexical information about that word, about the other words of the text, about syntax, and about the world. The set of the possible values for a given word X is open-ended, i.e., a list of values being given, it is always possible to build some context where X takes a value not present in the list. As a consequence, no dictionary – as thick as it could be – may contain all of them. Therefore, in any case, it is necessary to implement a mechanism which constructs the value from information taken in the dictionary, as well as from knowledge of the grammar and of the world.
In artificial intelligence (henceforth A.I.), the main objective is not to find the “correct” meaning of each word or phrase, but to get the set of consequences which can be drawn from a text; if the same set is obtained from “different” interpretations (i.e., interpretations using different values for some word), then the difference is irrelevant.
The first topic, psycholinguistic and cognitive aspects of lexical semantics, is addressed in the first two chapters. This area is particularly active but relatively ignored in computational circles. The reasons might be a lack of precise methods and formalisms. This area is, however, crucial for the construction of well-designed semantic lexicons by the empirical psychologically based analysis introduced in the domain of computational lexical semantics.
“Polysemy and related phenomena from a cognitive linguistic viewpoint” by Alan Cruse surveys the ways in which the contribution of the same grammatical word makes to the meaning of a larger unit depending on the context. Two main explanatory hypotheses that account for contextual variation are explored: lexical semantics and pragmatics. Alternatives to this approach are studied in the other parts of the volume.
The second chapter, “Mental lexicon and machine lexicon: Which properties are shared by machine and mental word representations? Which are not?” by J.-F. Le Ny, sets language comprehension and interpretation within a general cognitive science perspective. Properties common to natural and artificial semantic units (e.g., denotation, super-ordination, case roles, etc.) are first explored. Then, problems related to activability and accessibility in the memory are addressed in both a theoretical and experimental way.