To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book grew out of lecture notes for a course we started teaching at the Department of Computer Science at the Technion, Haifa, in the spring of 1996, and later also taught at the University of Haifa. The students were advanced undergraduates and graduates who had good knowledge of formal languages but limited background in linguistics. We intended it to be an introductory course in computational linguistics, but we wanted to focus on contemporary linguistic theories and the necessary mechanisms for reasoning about and implementing them, rather than on traditional (statistical, corpus-based) natural language processing (NLP) techniques.
We realized that no good textbook existed that covered the material we wanted to teach. Although quite a few good introductions to NLP exist, including Pereira and Shieber (1987), Gazdar and Mellish (1989), Covington (1994), Allen (1995), and more recently, Manning and Schütze (1999), Jurafsky and Martin (2000), and Bird et al. (2009), none of them provides the mathematical and computational infrastructure needed for our purposes.
The focus of this book is two dimensional. On one hand, we focus on a certain formalism, unification grammars, for presenting, studying, and reasoning about grammars. Although it is not the sole formalism used in computational linguistics, it has gained much popularity, and it now underlies many ongoing projects. On the other hand, we also focus on fundamental natural language syntactic constructions, and the way they are specified in grammars expressed in this formalism.
Feature structures are the building blocks of unification grammars, as they serve as the counterpart of the terminal and nonterminal symbols in CFGs. However, in order to define grammars and derivations, one needs some extension of feature structures to sequences thereof. In this chapter we present multirooted feature structures that are aimed at capturing complex, ordered information and are used for representing rules and sentential forms of unification grammars; we motivate this extension in Section 4.1. In parallel to the exposition of feature structures in Chapter 2, we start by defining multirooted feature graphs (Section 4.2), a natural extension of feature graphs. We then abstract away from the identities of nodes in the graphs in two ways: by defining multirooted feature structures, which are equivalence classes of isomorphic multirooted feature graphs, and by defining abstract multirooted structures (Section 4.3). Finally, we define the concept of multi-AVMs (Section 4.4), which are an extension of AVMs, and show how they correspond to multirooted graphs. The crucial concept of unification in context is discussed in Section 4.5.
We then utilize this machinery for defining unification grammars. We begin by defining (sentential) forms and grammar rules (Section 4.6). Then, we define the concept of derivation for unification grammars, providing a means for defining the languages generated by such grammars (Section 4.7). We explore derivation trees in Section 4.8.
The move from context-free grammars to unification grammars is motivated by linguistic considerations (the need to provide better generalizations and more compact representations).
The previous chapter presented four different views of feature structures, with several correspondences among them. For each of the views, a subsumption relation was defined in a natural way. In this chapter we define the operation of unification for the different views. The subsumption relation compares the information content of feature structures. Unification combines the information that is contained in two (compatible) feature structures.We use the term “unification” to refer to both the operation and its result. In the sequel, whenever two feature structures are related, they are assumed to be over the same signature.
The mathematical interpretation of “combining” two members of a partially ordered set is to take the least upper bound of the two operands with respect to the partial order; in our case, subsumption. Indeed, feature structure unification is exactly that. However, since subsumption is antisymmetric for feature structures and AFSs but not for feature graphs and AVMs, a unique least upper bound cannot be guaranteed for all four views. We begin with feature graphs and define unification for this view first, extending it to feature structures in Section 3.3. We then (Section 3.2) provide a constructive definition of feature graph unification and prove that it corresponds to the least upper bound definition in a naturalway. We also provide in Section 3.4 an algorithm for computing the unification of two feature graphs. AVM unification can then be defined indirectly, using the correspondence between feature graphs and AVMs. We define unification directly for AFSs in Section 3.5. We conclude this chapter with a discussion of generalization, a dual operation to unification.
Natural languages are among Nature's most extraordinary phenomena. While humans acquire language naturally and use it with great ease, the formalization of language, which is the focus of research in linguistics, remains evasive. As in other sciences, attempts at formalization involve idealization: ignoring exceptions, defining fragments, and the like. In the second half of the twentieth century, the field of linguistics has undergone a revolution: The themes that are studied, the vocabulary with which they are expressed, and the methods and techniques for investigating them have changed dramatically. While the traditional aims of linguistic research have been the description of particular languages (both synchronically and diachronically), sometimes with respect to other, related languages, modern theoretical linguistics seeks the universal principles that underlie all natural languages; it is looking for structural generalizations that hold across languages, as well as across various phrase types in a single language, and it attempts to delimit the class of possible natural languages by formal means.
The revolution in linguistics, which is attributed mainly to Noam Chomsky, has influenced the young field of computer science. With the onset of programming languages, research in computer science began to explore different kinds of languages: formal languages that are constructed as a product of concise, rigorous rules. The pioneering work of Chomsky provided the means for applying the results obtained in the study of natural languages to the investigation of formal languages.
We developed an elaborate theory of unification grammars, motivated by the failure of context-free grammars to capture some of the linguistic generalizations one would like to express with respect to natural languages. In this chapter, we put the theory to use by accounting for several of the phenomena that motivated the construction. Specifically, we account for all the language fragments discussed in Section 1.3.
Much of the appeal of unification-based approaches to grammar stems from their ability to a account for linguistic phenomena in a concise way; in other words, unification grammars facilitate the expression of linguistic generalizations. This is mediated through two main mechanisms: First, the notion of grammatical category is expressed via feature structures, thereby allowing for complex categories as first-class citizens of the grammatical theory. Second, reentrancy provides a concise machinery for expressing “movement,” or more generally, relations that hold in a deeper level than a phrase-structure tree. Still, the formalism remains monostratal, without any transformations that yield a surface structure from some other structural representation.
Complex categories are used to express similarities between utterances that are not identical. With atomic categories of the type employed by context free grammars, two categories can be either identical or different. With feature structures as categories, two categories can be identical along one axis but different along another.
This chapter gives the definition of ‘category’ in Section 1.1, and follows that by four sections devoted entirely to examples of categories of various kinds. If you have never met the notion of a category before, you should quite quickly read through Definition 1.1.1 and then go to Section 1.2. There you will find some examples of categories that you are familiar with, although you may not have recognized the categorical structure before. In this way you will begin to see what Definition 1.1.1 is getting at. After that you can move around the chapter as you like.
Remember that it is probably better not to start at this page and read each word, sentence, paragraph, …, in turn. Move around a bit. If there is something you don't understand, or don't see the point of, then leave it for a while and come back to it later.
Life isn't linear, but written words are.
Categories defined
This section contains the definition of ‘category’, follows that with a few bits and pieces, and concludes with a discussion of some examples. No examples are looked at in detail, that is done in the remaining four sections. Section 1.2 contains a collection of simpler examples, some of which you will know already. You might want to dip into that section as you read this section.
In Chapter 2, Sections 2.3 to 2.7 we looked at some simple examples of limits and colimits. These are brought together in Table 2.1 which is repeated here as Table 4.1. In this chapter we generalize the idea.
Before we begin the details it is useful to outline the five steps we go through together with the associated notions for each step. After that we look at each step in more detail.
Template
This is the shape ∇ that a particular kind of diagram can have. It is a picture consisting of nodes (blobs) and edges (arrows). The central column of Table 4.1 lists a few of the simpler templates. Technically, a template is often a directed graph or more generally a category.
Diagram
This is an instantiation of a particular template ∇ in a category C. Each node of ∇ is instantiated with an object of C, and each edge is instantiated with an arrow of C. There are some obvious source and target restrictions that must be met, and the diagram may require that some cells commute. Thus we sometimes use a category as a template.
Posed problem
Each diagram in a category C poses two problems, the left (blunt end) problem and the right (sharp end) problem. We never actually say what the problem is (which is perhaps the reason why it is rarely mentioned) but we do say what a solution is.
Eilenberg and MacLane invented (discovered) category theory in the early 1940s. They were working on Čech cohomology and wanted to separate the routine manipulations from those with more specific content. It turned out that category theory is good at that. Hence its other name abstract nonsense which is not always used with affection.
Another part of their motivation was to try to explain why certain ‘natural’ constructions are natural, and other constructions are not. Such ‘natural’ constructions are now called natural transformations, a term that was used informally at the time but now has a precise definition. They observed that a natural transformation passes between two gadgets. These had to be made precise, and are now called functors. In turn each functor passes between two gadgets, which are now called categories. In other words, categories were invented to support functors, and these were invented to support natural transformations.
But why the somewhat curious terminology? This is explained on pages 29 and 30 of Mac Lane (1998).
… the discovery of ideas as general as these is chiefly the willingness to make a brash or speculative abstraction, in this case supported by the pleasure of purloining words from philosophers: “Category” from Aristotle and Kant, “Functor” from Carnap …
That, of course, is the bowdlerized version.
Most of the basic notions were set up in Eilenberg and MacLane (1945) and that paper is still worth reading.
The isolation of the notion of an adjunction is one of the most important contributions of category theory. In a sense adjoints form the first ‘non-trivial’ part of category theory; at least it can seem that way now that all the basic stuff has been sorted out. There are adjunctions all over mathematics, and examples were known before the categorical notion was formalized. We have already met several examples, and later I will point you to them.
In this chapter we go through the various aspects of adjunctions quite slowly. We look at each part in some detail but, I hope, not in so much detail that we lose the big picture.
There is a lot going on in adjunctions, and you will probably get confused more than once. You might get things mixed up, forget which way an arrow is supposed to go, not be able to spell contafurious, and so on. Don't worry. I've been at it for over 40 years and I still can't remember some of the details. In fact, I don't try to. You should get yourself to the position where you can recognize that perhaps there is an adjunction somewhere around, but you may not be quite sure where. You can then look up the details. If you ever have to use adjunctions every day, then the details will become second nature to you.
As it says on the front cover this book is an introduction to Category Theory. It gives the basic definitions; goes through the various associated gadgetry such as functors, natural transformations, limits and colimits; and then explains adjunctions. This material could be developed in 50 pages or so, but here it takes some 220 pages. That is because there are many examples illustrating the various notions, some rather straightforward, and others with more content. More importantly, there are also over 200 exercises. And perhaps even more importantly, solutions to these exercises are available online.
The book is aimed primarily at the beginning graduate student, but that does not mean that other students or professional mathematicians will not find it useful. I have designed the book so that it can be used by a single student or small group of students to learn the subject on their own. The book will make a suitable text for a reading group. The book does not assume the reader has a broad knowledge of mathematics. Most of the illustrations use rather simple ideas, but every now and then a more advanced topic is mentioned. The book can also be used as a recommended text for a taught introductory course.
Every mathematician should at least know of the existence of category theory, and many will need to use categorical notions every now and then. For those groups this is the book you should have. Other mathematicians will use category theory every day.