To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Because we think of feature structures as representing partial information, the question immediately arises as to whether the notion of total information makes sense in this framework and if so, what its relation to partial information might be. Our notions of “total” information in this chapter are decidedly relative in the sense that they are only defined with respect to a fixed type inheritance hierarchy and fixed appropriateness conditions.
Before tackling the more complicated case of feature structures, we consider the first-order terms. First-order terms can be ordered by subsumption. The maximal elements in this ordering are the ground terms, a ground term being simply a term without variables. It is well known that the semantics of logic programming systems and even first-order logic can be characterized by restricting attention to the collection of ground terms, the so-called Herbrand Universe. The reason this is true is that every first-order term is equivalent to the meet of the collection of its ground extensions (Reynolds 1970). Thus any first-order term is uniquely determined by its ground extensions. It is also interesting to note that as a corollary, a term subsumes a second term just in case its set of ground instantiations is strictly larger than the set of ground instantiations of the second term. In this chapter, we consider whether or not we can derive equivalent results for feature structures. One motivation for deriving such results is that they make life much easier when it comes time to characterize the behavior of feature structure unification-based phrase structure grammars and logic programming systems. Unfortunately, we achieve only limited success.
Up to now, our presentation of feature structures has been fairly standard. In particular, our feature structures are identical to the finite sorted feature structures introduced by Pollard and Moshier (1990) and axiomatized by Pollard (in press). In terms of the informational ordering that they produce, our feature structures (modulo alphabetic variance) are nothing more than a notational variant of the ψ-terms of Aït-Kaci (1984, 1986) (modulo tag renaming and top “smashing,” with a bounded complete partial order of types). The important thing to note about the feature structures is that although we allowed a combination of primitive information with type symbols and structured information in terms of features and their values, there was no notion of typing; arbitrary labelings of directed graphs with type symbols and features were permissible.
In this chapter we introduce our notion of typing for feature structures, which is based on the notion of typing introduced informally by Pollard and Sag (1987:38) for the HPSG grammar formalism. Pollard and Sag introduced appropriateness conditions to model the distinction between features which are not appropriate for a given type and those whose values are simply unknown. Note that since appropriateness is defined as a relation between types and features, we say both that types are appropriate for features or that features are appropriate for types. We have already seen how the type symbols themselves are organized into a multiple inheritance hierarchy and how information from the types interacts with information encoded structurally in terms of features and values.
In this chapter we briefly show how definite clause logic programs fall out as a particular instance of unification-based phrase structure grammars. The analysis is based on the realization that parse trees for unification-based grammars bear a striking resemblance to proof trees for logic programs. More significantly, the top-down analysis of unification-based grammars generalizes the notion of SLD-resolution as it is applied in definite clause logic programming, whereas the bottom-up analysis generalizes the standard notion of denotational semantics for logic programs. The results in this chapter can be taken as a generalization of the Prolog family of programming languages (though as we have said before, we put off the analysis of inequations in grammars and hence in definite clause programs until the next chapter on constraint-resolution).
It has been noted in the past, most notably by Mukai (1985, 1987, Mukai and Yasukawa 1985), Aït-Kaci and Nasr (1986), and Höhfeld and Smolka (1988), that the idea of definite clause programming can be extended to domains other than first-order terms. In particular, the systems developed by Mukai and Yasukawa, Aït-Kaci and Nasr, and Hohfeld and Smolka employ a more or less standard notion of definite clauses with the simple modification of replacing first-order terms with various notions of feature structure. Of course, this move was preceded by extensions to Prolog from within the Prolog community itself by Colmerauer (1984), who developed Prolog II, a language based on definite clauses that allowed terms to contain cycles and also inequations. In this chapter, we generalize all of these systems by showing how any of our systems of feature structures can be used as the basis for defining a definite clause programming language.
From the outside, our feature structures look much like the ψ-terms of Aït-Kaci (1984, 1986) or the feature structures of Pollard and Sag (1987), Moshier (1988) or Pollard and Moshier (1990). In particular, a feature structure is modeled by a possibly cyclic directed graph with labels on all of the nodes and arcs. Each node is labeled with a symbol representing its type, and the arcs are labeled with symbols representing features. We think of our types as organizing feature structures into natural classes. In this role, our types are doing the same duty as concepts in a terminological knowledge representation system (Brachman and Schmolze 1985, Brachman, Fikes, and Levesque 1983, Mac Gregor 1988) or abstract data types in object-oriented programming languages (Cardelli and Wegner 1985). Thus it is natural to think of the types as being organized in an inheritance hierarchy based on their generality. Feature structure unification is then modified so that two feature structures can only be unified if their types are compatible according to the primitive hierarchy of types.
In this chapter, we discuss how type inheritance hierarchies can be specified and the restrictions that we impose on them that allow us to define an adequate notion of type inference, which is necessary during unification. These restrictions were first noted by Aït-Kaci (1984) in his unification-based reasoning system. The polymorphism allowed in our type system is based on inheritance in which a subtype inherits information from all of its supertypes. The possibility of more than one supertype for a given type allows for multiple inheritance.
In this chapter we consider the addition of variables ranging over feature structures to our description language. It turns out that the addition of variables does not increase the representational power of the description language in terms of the feature structures which it can distinguish. Of course, this should not be surprising given the description theorem, which tells us that every feature structure can be picked out as the most general satisfier of some description. On the other hand, we can replace path equations and inequations in favor of variables and equations and inequations between variables if desirable. We prove a theorem to this effect in the latter part of this chapter. The reason that we consider variables now is that they have shown up in various guises in the feature structure literature, and are actually useful when considering applications such as definite clause programming languages based on feature structures. Our treatment of variables most closely follows that of Smolka (1988, 1989), who treats variables as part of the language for describing feature structures. Aït-Kaci (1984, 1986) also used variable-like objects, which he called tags. Due to the fact that he did not have a description language, Aït-Kaci had to consider variables to be part of the feature structures themselves, and then factor the class of feature structures with respect to alphabetic variance to recover the desired informational structure. We have informally introduced tags in our attribute value matrix diagrams, but did not consider them to be part of the feature structures themselves.
We assume that we have a countably infinite collection Var of variables.
In our development up to this point, we have treated feature structures logically as models of descriptions expressed in a simple attribute-value language. In the last chapter, we extended the notion of description to include variables; in this chapter, we generalize the notion of model to partial algebraic structures. An algebraic model consists of an arbitrary collection of domain objects and associates each feature with a unary partial function over this domain. In the research of Smolka (1988, 1989) and Johnson (1986, 1987, 1990), more general algebraic models of attribute-value descriptions are the focus of attention. We pull the rabbit out of the hat when we show that our old notion of satisfaction as a relation between feature structures and descriptions is really just a special case of a more general algebraic definition of satisfaction. The feature structures constitute an algebraic model in which the domain of the model is the collection of feature structures, and the features pick out their natural (partial) value mappings. What makes the feature structure model so appealing from a logical perspective is that it is canonical in the sense that descriptions are logically equivalent if and only if they are logically equivalent for the feature structure model. In this respect, the feature structure model plays the same logical role as term models play in universal algebra. This connection is strengthened in light of the most general satisfier and description theorems, which allow us to go back and forth between feature structures and their normal form descriptions.
With our definition of inheritance, we have a notion of consistency and unification for our smallest conceptual unit, the type. We now turn to the task of developing structured representations that can be built out of the basic concepts, which we call feature structures. The reason for the qualifier is that even though feature structures are defined using our type symbols, they are not typed, in the sense that there is no restriction on the co-occurrence of features or restrictions on their values. We introduce methods for specifying appropriateness conditions on features and a notion of well-typing only after studying the ordered notion of feature structures. We also hold off on introducing inequations and extensionality conditions. Before introducing these other topics, we concentrate on fully developing the notion of untyped feature structure and the logical notions we employ. Most of the results that hold for feature structures can be immediately generalized to well-typed feature structures by application of the type inference mechanism.
Our feature structures are structurally similar to the more traditional form of feature structures such as those used in the patr-ii system and those defined by Rounds and Kasper. The next major development after these initial systems was introduced by Moshier (1988). The innovation of Moshier's system was to allow atomic symbols to label arbitrary nodes in a feature structure. He also treated the identity conditions for these atoms fully intensionally. Both patr-ii and the Rounds and Kasper systems treated feature structures intensionally, but enforced extensional identity conditions on atoms.
In this chapter, we consider a phrase structure grammar formalism, or more precisely, a parameterized family of such formalisms, in which non-terminal (category) symbols are replaced by feature structures in both rewriting rules and lexical entries. Consequently, the application of a rewriting rule must be mediated by unification rather than by simple symbol matching. This explains why grammar formalisms such as the one we present here have come to be known as unification-based. Although our presentation of unification-based phrase structure grammars is self contained, for those unfamiliar with unification-based grammars and their applications, we recommend reading Shieber's excellent introduction (Shieber 1986). Shieber lays out the fundamental principles of unification-based phrase structure formalisms along with some of their more familar incarnations, as well as providing a wide variety of linguistic examples and motivations. Another good introductory source is the text by Gazdar and Mellish (1989).
The early development of unification-based grammars was intimately connected with the development of logic programming itself, the most obvious link stemming from Colmerauer's research into Q-systems (1970) and Metamorphosis Grammars (1978). In fact, Colmerauer's development of Prolog was motivated by the desire to provide a powerful yet efficient implementation environment for natural language grammars. The subsequent popularity of Prolog led to the development of a number of so-called logic grammar systems. These grammar formalisms are typically variations of first-order term unification phrase structure grammars such as the Definite Clause Grammars (DCGS) of Pereira and Warren (1980), the Extraposition Grammars of Pereira (1981), the Slot Grammars of McCord (1981) and also the Gapping Grammars of Dahl and Abramson (1984, Popowich 1985).
In this chapter we introduce the logical attribute-value language that we employ to describe feature structures. Our description language is the same (in the sense of containing exactly the same descriptions) as that of Rounds and Kasper (1986), but following Pollard (in press) our interpretations are based on our feature structures rather than the ones provided by Rounds and Kasper. One way to look at the language of descriptions is as providing a way to talk about feature structures; the language can be displayed linearly one symbol after another and can thus be easily used in implementations. Other researchers, namely Johnson (1987, 1988), Smolka (1988) and King (1989), use similar, but much more powerful, logical description languages and take a totally different view of the nature of interpretations or models for these languages. The main difference in their descriptions is the presence of general description negation, implication, and variables. The interpretations they use are based on a domain of objects with features interpreted as unary partial functions on this domain. In the system we describe here, we view feature structures themselves as partial objects; descriptions are just a particularly neat and tidy notation for picking them out. Our view thus corresponds to Aït-Kaci's treatment of his ψ-terms, which were also taken to be partial descriptions of empirical objects. Johnson (1988:72), on the other hand, is at pains to point out that he does not view feature structures as partial models or descriptions of total linguistic objects, but rather as total objects themselves.