To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter describes – from a mathematical perspective – the system of typed feature structures used in the ACQUILEX Lexical Knowledge Base (LKB). We concentrate on describing the type system the LKB takes as input, making explicit the necessary conditions on the type hierarchy and explaining how – mathematically – our system of constraints works. It is assumed that the reader is familiar with basic unification-based formalisms like PATR-II, as explained in Shieber (1986). It must also be said from the start that our approach draws heavily on the work on typed feature structures by Carpenter (1990, 1992).
The LKB works basically through unification on (typed) feature structures. Since most of the time we deal with typed feature structures (defined in section 10.2) we will normally drop the qualifier and talk about feature structures. When necessary, to make a distinction, we refer to structures in PATR-II and similar systems as untyped feature structures. Feature structures are defined over a (fixed) finite set of features FEAT and over a (fixed) type hierarchy 〈TYPE, ⊑〉. Given FEAT and 〈TYPE, ⊑〉 we can define T the collection of all feature structures over FEAT and 〈TYPE, ⊑〉. But we are interested in feature structures which are well-formed with respect to a set of constraints. To describe constraints and well-formedness of feature structures we specify a function C: TYPE → F, which corresponds to an association of a constraint feature structure C(ti) to each type ti in the type hierarchy TYPE.
A stand-alone version of the LKB software system, with demonstration type systems, lexicons and so on is available for distribution; contact the authors for further details.
The Sussex DATR Implementation
The Sussex implementation of DATR comprises a compiler written in Prolog, and a wide-ranging collection of DATR example files. The compiler takes a DATR theory and produces Prolog code for query evaluation relative to that theory. The code is readily customisable, and customisations for Poplog Prolog, CProlog, Arity Prolog, Prolog2, Quintus Prolog and Sicstus Prolog are provided. Source listings, documentation and many of the example files may also be found in “The DATR Papers, Volume 1” (Cognitive Science Research Report 139).
This implementation is provided ‘as is’, with no warranty or support, and for research purposes only. Copyright remains with the University of Sussex.
The Prolog source code and the example files for the DATR system are available on a 720K 3.5 inch MS-DOS disk for £12.00 (within the UK) or US $25 (outside the UK) from: Technical Reports, School of Cognitive and Computing Sciences, University of Sussex, Brighton BN1 9QH, UK. “The DATR Papers, Volume 1” (Cognitive Science Research Report 139) is also available for £6 (US $ 12) from the same address.
The Traffic Information Collator (TIC) (Allport, 1988a,b) is a prototype system which takes verbatim police reports of traffic incidents, interprets them, builds a picture of what is happening on the roads and broadcasts appropriate messages automatically to motorists where necessary. Cahill and Evans (1990) described the process of converting the main TIC lexicon (a lexicon of around 1000 words specific to the domain of traffic reports) into DATR (Evans and Gazdar, 1989a,b; 1990). This chapter reviews the strategy adopted in the conversion discussed in that paper, and discusses the results of converting the whole lexicon, together with statistics comparing efficiency and performance between the original lexicon and the DATR version.
Introduction
The Traffic Information Collator (TIC) is a prototype system which takes verbatim police reports of traffic incidents, interprets them, builds a picture of what is happening on the roads and broadcasts appropriate messages automatically to motorists where necessary. In Cahill and Evans (1990), the basic strategy of defining the structure of lexical entries was described. That paper concentrated on the main TIC lexicon, which was just one part of a collection of different kinds of lexical information and only dealt with a small fragment even of that. The whole conversion involved collating all of that information into a single DATR description.
In this chapter, we address some issues in the design of declarative languages based on the notion of inheritance. First, we outline the connections and similarities between the notions of object, frame, conceptual graph and feature structures and we present a synthetic view of these notions. We then present the Typed Feature Structure (TFS) language developed at the University of Stuttgart, which reconciles the object-oriented approach with logic programming. We finally discuss some language design issues.
Convergences
Developing large NLP software is a very complex and time consuming task. The complexity of NLP can be characterized by the following two main factors:
NLP is data-intensive. Any NLP application needs large amounts of complex linguistic information. For example, a realistic application has typically dictionaries with tens of thousands of lexical entries.
Sophisticated NLP applications such as database interfaces or machine translation build very complex and intricate data structures for representing linguistic objects associated to strings of words. Part of the complexity also lies in the processing of such objects.
Object-oriented Approaches
An object-oriented approach to linguistic description addresses these two sources of complexity by providing:
facilities to manage the design process: data abstraction and inheritance.
facilities for capturing directly the interconnections and constraints in the data: properties, relations and complex objects. These features are common to object-oriented languages (OOL), objectoriented database management systems (OODBMS) or knowledge representation languages (KRL).
Theories of nonmonotonic reasoning are, on the face of it, of at least two sorts. In Circumscription, generic facts like Birds fly are taken to be essentially normative, and nonmonotonicity arises when individuals are assumed to be as normal as is consistent with available information about them. In theories like Default Logic such facts are taken to be rules of inference, and nonmonotonicity arises when available information is augmented by adding as many as possible of the inferences sanctioned by such rules. Depending on which of the two informal views is taken, different patterns of nonmonotonic reasoning are appropriate. Here it is shown that these different patterns of reasoning cannot be combined in a single theory of nonmonotonic reasoning.
Introduction
Nonmonotonic reasoning is that which lacks a monotonicity property which has been taken to be a characteristic of logical reasoning. In a theory of nonmonotonic reasoning, the consequences of a set of premises do not always accumulate as the set of premises is expanded. Nonmonotonicity has been researched in artificial intelligence because people reason nonmonotonically, and this in ways which seem directly related to intelligence. One important example of this is in commonsense reasoning about kinds, where jumping to conclusions enables intelligent agents to save time spent on gathering information.
In just what way is such jumping to logically invalid conclusions reasonable?
We present a definition of skeptical and credulous variants of default unification, the purpose of which is to add default information from one feature structure to the strict information given in another. Under the credulous definition, the default feature structure contributes as much information to the result as is consistent with the information in the strict feature structure. Credulous default unification turns out to be non-deterministic due to the fact that there may be distinct maximal subsets of the default information which may be consistently combined with the strict information. Skeptical default unification is obtained by restricting the default information to that which is contained in every credulous result. Both definitions are fully abstract in that they depend only on the information ordering of feature structures being combined and not on their internal structure, thus allowing them to be applied to just about any notion of feature structure and information ordering. We then consider the utility of default unification for constructing templates with default information and for defining how information is inherited in an inheritance-based grammar. In particular, we see how templates in the style of PATR-II can be defined, but conclude that such mechanisms are overly sensitive to order of presentation. Unfortunately, we only obtain limited success in applying default unification to simple systems of default inheritance. We follow the Common Lisp Object System-based approach of Russell et al.
In recent years, the lexicon has become the focus of considerable research in (computational) linguistic theory and natural language processing (NLP) research; the reasons for this trend are both theoretical and practical. Within linguistics, the role of the lexicon has become increasingly central as more and more linguistic generalisations have been seen to have a lexical dimension, whilst for NLP systems, the lexicon has increasingly become the chief ‘bottleneck’ in the production of habitable applications offering an adequate vocabulary for the intended task. This edited collection of essays derives from a workshop held in Cambridge in April 1991 to bring together researchers from both Europe and America and from both fields working on formal and computational accounts of the lexicon. The workshop was funded under the European Strategic Programme in Information Technology (ESPRIT) Basic Research Action (BRA) through the ACQUILEX project (‘Acquisition of Lexical Information for Natural Language Processing’) and was hosted by the Computer Laboratory, Cambridge University.
The ACQUILEX project is concerned with the exploitation of machine-readable versions of conventional dictionaries in an attempt to develop substantial lexicons for NLP in a resource efficient fashion. However, the focus of the workshop was on the representation and organisation of information in the lexicon, regardless of the mode of acquisition.
This text is intended to be a student textbook which primarily provides an introduction to (a particular kind of) categorical type theory, but which should also be useful as a reference to those mathematicians and computer scientists pursuing research in related areas. It is envisaged that it could provide the basis for a course of lectures aimed at advanced undergraduate or beginning graduate students. Given the current content of typical British undergraduate mathematics and computer science courses, it is difficult to describe an exact audience at whom the book is aimed. For example, the material on ordered sets should be readily accessible to first and second year undergraduates and indeed I know of courses which contain elements of such topics. However, the material on category theory, while probably accessible to good third year undergraduate students, does require a certain amount of mathematical maturity. Perhaps it is better suited to graduate students in their early stages. Chapters 3 and 4 are probably of second and third year undergraduate level respectively, assuming that the requisite category theory has been assimilated. The final two chapters are probably better suited to first year graduates. In summary, as well as serving as a textbook for (graduate) students, I hope “Categories for Types” will provide a useful reference for those conducting research into areas involving categorical type theory and program semantics.
Discussion 1.1.1 We shall begin by giving an informal description of some of the topics which appear in Chapter 1. The central concept is that of an ordered set. Roughly, an ordered set is a collection of items some of which are deemed to be greater or smaller than others. We can think of the set of natural numbers as an ordered set, where, for example, 5 is greater than 2, 0 is less than 100, 1234 is less than 12687 and so on. We shall see later that one way in which the concept of order arises in computer science is by regarding items of data as ordered according to how much information a certain data item gives us. Very crudely, suppose that we have two programs P and P′ which perform identical tasks, but that program P is defined (halts with success) on a greater number of inputs than does P′. Then we could record this observation by saying that P is greater than P′. These ideas will be made clearer in Discussion 1.5.1. We can perform certain operations on ordered sets, for example we have simple operations such as maxima and minima (the maximum of 5 and 2 in the ordered set of natural numbers is 5), as well as more complicated ones such as taking suprema and infima. If the reader has not met the idea of suprema and infima, then he will find the definitions in Discussion 1.2.7.
During the Michaelmas term of 1990, while at the University of Cambridge Computer Laboratory, the opportunity arose to lecture on categorical models of lambda calculi. The course consisted of sixteen lectures of about one hour's duration twice a week for eight weeks, and covered much of the material in this book, but excluded higher order polymorphism and some of the category theory. The lectures were delivered to an audience of computer scientists and mathematicians, with an emphasis on presenting the material to the former. It was kindly suggested by the Cambridge University Press that these lectures might form the core of a textbook, and the original suggestion has now been realised as “Categories for Types.”
What are the contents of “Categories for Types”? I will try to answer this question for those who know little about categorical type theory. In Chapter 1, we begin with a discussion of ordered sets. These are collections of things with an order placed on the collection. For example, the natural numbers form a set {1,2,3…} with an order given by 1 ≤ 2 ≤ 3 ≤ … where ≤ means “less than or equal to.” A number of different kinds of ordered set are defined, and results proved about them. Such ordered sets then provide a stock of examples of categories. A category is a very general mathematical world and various different sorts of mathematical structures form categories.
Discussion 2.1.1 A category consists of a pair of collections, namely a collection of “structures” together with a collection of “relations between the structures.” Let us illustrate this with some informal examples of categories.
The collection of all sets (thus each set is an example of one of the structures referred to above), together with the collection of all set-theoretic functions (the functions are the relations between the structures).
The collection of all posets (each poset is a structure), together with all monotone functions (the monotone functions are the relations between the structures).
The collection of all finite dimensional vector spaces, together with all linear maps.
The set of real numbers ℝ (in this case each structure is just a real number r ∈ ℝ), together with the relation of order ≤ on the set ℝ. Thus given two structures r, r′ ∈ℝ, there is a relation between them just in case r ≤r′.
All categories have this basic form, that is, consist of structures and relations between the structures: the structures are usually referred to as the objects of the category and the relations between the structures as morphisms. It is important to note that the objects of a category do not have to be sets (in the fourth example they are real numbers) and that the morphisms do not have to be functions (in the fourth example they are instances of the order relation ≤).
Discussion 3.1.1 The fundamental idea of algebraic type theory is to provide a formal framework for reasoning using the usual rules of equality. Simple algebraic type theory is far removed from the syntax and rules of real programming languages, but it is a good starting point from which to motivate and explain the ideas of categorical semantics. To a first approximation, algebraic type theory involves three entities, which will be called types, terms and equations. Think of a type as a collection of items (terms) having some common property, and an equation as a judgement that any two items (terms) are essentially the same. We aim to define the notion of a theory in algebraic type theory. Recall that the general idea of a theory is that one has some basic assumptions, usually referred to as axioms, and some rules for deducing theorems from the axioms. Thus a theory in algebraic type theory consists of a given collection of types, terms and equations, where the equations are the axioms of the theory. The theorems are the equations which one is able to deduce from the axioms using the rules of algebraic type theory. These rules say that equations may be manipulated according to the rules of reflexivity, transitivity and symmetry. So, for example, if a term t equals a term s, then we may deduce that the term s equals the term t, and this is the idea of symmetry.