To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Computer science, like other mathematical fields, cannot live without a tight relationship with reality. However, such a relationship is, frankly, not very common. This is probably why people so enthusiastically welcome a true meeting of theory and practice. In that sense, the coming together of XML and tree automata theory was a beautiful marriage. Thus I have written this book in the earnest hope that the news of this marriage will be spread and celebrated all over the world!
The book is a summary of my ten years' work. It could not have been realized without my collaborators, Peter Buneman, Giuseppe Castagna, Alain Frisch, Vladimir Gapeyev, Kazuhiro Inaba, Shinya Kawanaka, Hiromasa Kido, Michael Y. Levin, Sebastian Maneth, Makoto Murata, Benjamin C. Pierce, Tadahiro Suda, Jérôme Vouillon, Takeshi Yashiro, and Philip Wadler. In particular, I thank Kazuhiro Inaba and Sebastian Maneth, who made uncountable comments on the draft and thus contributed to a huge improvement of the book. Lastly, I thank my dear wife, Ayako, who gave me the warmest and unceasing encouragement to finish the book.
This chapter presents efficient algorithms for several important problems related to XML processing, namely: (1) membership for tree automata (to test whether a given tree is accepted by a given tree automaton); (2) evaluation of marking tree automata (to collect the set of bindings yielded by the matching of a given tree against a given marking tree automaton); and (3) containment for tree automata (to test whether the languages of given two tree automata are in subset relation). For these problems we describe several “on-the-fly” algorithms, in which only a part of the whole state space is explored to obtain the final result. In such algorithms there are two basic approaches, top-down and bottom-up. The top-down approach explores the state space from the initial states whereas the bottom-up approach does this from the final states. In general, the bottom-up approach tends to have a lower complexity in the worst case whereas the top-down often gives a higher efficiency in practical cases. We will also consider a further improvement from combining these ideas.
Membership algorithms
In this section, we will consider three algorithms for testing membership. The first is a top-down algorithm, which can be obtained rather simply from the semantics of tree automata but takes time that is exponential in the size of the input tree, in the worst case.
There exist many other areas of possibly fruitful application of relational mathematics from which we select three topics – and thus omit many others. In the first section, we will again study the lifting into the powerset. Credit is mainly due to theoretical computer scientists (e.g. [22, 39]) who have used and propagated such constructs as an existential image or a power transpose. When relations (and not just mappings as for homomorphisms) are employed to compare state transitions, we will correspondingly need relators as a substitute for functors. We will introduce the power relator. While rules concerning power operations have often been only postulated, we go further here and deduce such rules in our general axiomatic setting.
In Section 19.2, we treat questions of simulation using relational means. When trying to compare the actions of two black box state transition systems, one uses the concept of a bisimulation to model the idea of behavioral equivalence. In Section 19.3, a glimpse of state transition systems and system dynamics is offered.
The key concept is that of an orbit, the sequence of sets of states that result when continuously executing transitions of the system starting from some given set of states. There exist many orbits in a delicately related relative position. Here the application of relations seems particularly promising.
Tree transducers form a class of finite-state models that not only accept trees but also transform them. While in Chapter 7 we described a simple yet very powerful tree transformation language, μXDuce, tree transducers are more restricted and work with only a finite number of states. An important aspect of the restriction is that tree transducers can never use intermediate data structure. However, this restriction is not too unrealistic, as it is one of the design principles of the XSLT language – currently the most popular language for XML transformation. In addition, this restriction leads to several nice properties that otherwise would hardly hold; among the most important is the exact typechecking property to be detailed in Chapter 11. The present chapter introduces two standard tree transducer frameworks, namely, top-down tree transducers and macro tree transducers, and shows their basic properties.
Top-down tree transducers
Top-down tree transducers involve a form of transformation that traverses a given tree from the root to the leaves while at each node producing a fragment of the output tree determined by the label of the current node and the current state. Since a state of a tree transducer can then be seen as a rule from a label to a tree fragment, we call a state a procedure from now on.
When considering the tree transducer family, we often take a nondeterministic approach, as in the case of automata.
A comparison may help to describe the intention of this book: natural sciences and engineering sciences have their differential and integral calculi. Whenever practical work is to be done, one will easily find a numerical algebra package at the computing center which one will be able to use. This applies to solving linear equations or determining eigenvalues, for example, in connection with finite element methods.
The situation is different for various forms of information sciences as in the study of vagueness, fuzziness, spatial or temporal reasoning, handling of uncertain/rough/ qualitative knowledge in mathematical psychology, sociology, and computational linguistics, to mention a few areas. These also model theoretically with certain calculi, the calculi of logic, of sets, the calculus of relations, etc. However, for applications practitioners will usually apply Prolog-like calculi. Hardly anybody confronted with practical problems knows how to apply relational calculi; there is almost no broadly available computer support. There is usually no package able to handle problems beyond toy size. One will have to approach theoreticians since there are not many practitioners in such fields. So it might seem that George Boole in 1854 [26, 28] was right in saying:
It would, perhaps, be premature to speculate here upon the question whether the methods of abstract science are likely at any future day to render service in the investigation of social problems at all commensurate with those which they have rendered in various departments of physical inquiry.
The data format known as extensible mark-up language (XML) describes tree structures based on mark-up texts. The tree structures are formed by inserting, between text fragments, open and end tags that are balanced, like parentheses. A data set thus obtained is often called a document. On the surface, XML resembles hypertext mark-up language (HTML), the most popular display format for the Web. The essential difference, however, is that in XML the structure permitted to documents, including the set of tag names and their usage conventions, is not fixed a priori.
More precisely, XML allows users to define their own schemas; a schema determines the permitted structure of a document. In this sense, it is often said that a schema defines a “subset of XML” and thus XML is a “format for data formats.” With the support of schemas each individual application can define its own data format, while virtually all applications can share generic software tools for manipulating XML documents. This genericity is a prominent strength of XML in comparison with other existing formats. Indeed, XML has been adopted with unprecedented speed and range: an enormous number of XML schemas have been defined and used in practice. To raise a few examples, extensible HTML (XHTML) is the XML version of HTML, simple object access protocol (SOAP) is an XML message format for remote procedure calls, scalable vector graphics (SVG) is a vector graphics format in XML, and MathML is an XML format for mathematical formulas.
Many of the problems handled in applications are traditionally formulated in terms of graphs. This means that graphs will often be drawn and the reasoning will be pictorial. On the one hand, this is nice and intuitive when executed with chalk on a blackboard. On the other hand there is a considerable gap from this point to treating the problem on a computer. Often ad hoc programs are written in which more time is spent on I/O handling than on precision of the algorithm. Graphs are well suited to visualization of a result, even with the possibility of generating the graph via a graph drawing program. What is nearly impossible is the input of a problem given by means of a graph – when not using RELVIEW'S interactive graph input (see [18], for example.). In such cases some sort of relational interpretation of the respective graph is usually generated and input in some way.
We will treat reducibility and irreducibility first, mentioning also partial decomposability. Then difunctional relations are studied in the homogeneous context which provides additional results. The main aim of this chapter is to provide algorithms to determine relationally specified subsets of a graph or relation in a declarative way.
Reducibility and irreducibility
We are now going to study in more detail the reducibility and irreducibility introduced in a phenomenological form as Def. 6.12. Many of these results for relations stem from Georg Frobenius [54] and his study of eigenvalues of non-negative realvalued matrices; a comprehensive presentation is given in [92].
The work in this chapter – although stemming from various application fields – is characterized by two antitone mappings leading in opposite directions that cooperate in a certain way. In most cases they are related to one or more relations which are often heterogeneous. An iteration leads to a fixed point of a Galois correspondence. Important classes of applications lead to these investigations. Trying to find out where a program terminates, and thus correctness considerations, also invoke such iterations. Looking for the solution of games is accompanied by these iterations. Applying the Hungarian alternating chain method to find maximum matchings or to solve assignment problems subsumes to these iterations. All this is done in structurally the same way, and deserves to be studied separately.
Galois iteration
When Evariste Galois wrote down his last notes, in preparation for the duel in 1832, in which he expected to die, he probably could not have imagined to what extent these notes would later influence mathematics and applications. What he had observed may basically be presented with the correspondence of permutations of a set and their fixed points. Consider the 5-element sequence {1, 2, 3, 4, 5} for which there exist in total 5! = 120 permutations. The idea is now to observe which set of permutations leaves which set of elements fixed. Demanding more elements to be untouched by a permutation results, of course, in fewer permutations.
Usually, we are confronted with sets at a very early period of our education. Depending on the respective nationality, it is approximately at the age of 10 or 11 years. Thus we carry with us quite a burden of concepts concerning sets. At least in Germany, Mengenlehre as taught in elementary schools will raise bad memories on discussing it with parents of school children. All too often, one will be reminded of Georg Cantor, the inventor of set theory, who became mentally ill. At a more advanced level, we encounter a number of paradoxes making set theory problematic, when treated in a naïve way. One has to avoid colloquial formulations completely and should confine oneself to an adequately restricted formal treatment.
The situation does not improve when addressing logicians. Most of them think in just one universe of discourse containing numbers, letters, pairs of numbers, etc., altogether rendering themselves susceptible to numerous semantic problems. While these, in principle, can be overcome, ideally they should nevertheless be avoided from the beginning.
In our work with relations, we will mostly be restricted to finite situations, which are much easier to work with and to which most practical work is necessarily confined. A basic decision for this text is that a (finite) set is always introduced together with a linear ordering of its elements. Only then we will have a well-defined way of presenting a relation as a Boolean matrix.
Already in the previous chapters, relations have shown up in a more or less naïve form, for example as permutation matrices or as (partial) identity relations. Here, we provide ideas for more stringent data types for relations. Not least, these will serve to model graph situations, like graphs on a set, bipartitioned graphs, or hypergraphs.
What is even more important at this point is the question of denotation. We have developed some scrutiny when denoting basesets, elements of these, and subsets; all the more will we now be careful in denoting relations. Since we restrict ourselves mostly to binary relations, this will mean denoting the source of the relation as well as its target and then denoting the relation proper. It is this seemingly trivial point which will be stressed here, namely from which set to which set the relation actually leads.
Relation representation
We aim mainly at relations over finite sets. Then a relation R between sets V, W is announced as R : V → W.
— as a set of pairs {(x, y),…} with x ∈ V, y ∈ W
— as a list of pairs [(x,y),…] with x∷V,y∷W
— in predicate form {(x, y) ∈ V × W ∣ p(x, y)} with a binary predicate p
Lattices penetrate all our life. Early in school we learn about divisibility and look for the greatest common divisor as well as for the least common multiple. Later we learn about Boolean lattices and use union and intersection “∪,∩” for sets as well as disjunction and conjunction “⋁,⋀” for predicates. Concept lattices give us orientation in all our techniques of discourse in everyday life - usually without coming to our attention. Nevertheless, lattices completely dominate our imagination. Mincut lattices originate from flow optimization, assignment problems, and further optimization tasks. It is well known that an ordering can always be embedded into a complete lattice. Several embedding constructions are conceivable: examples are cut completion and ideal completion.
We introduce all this step by step, concentrating first on order-theoretic functionals. Then we give several examples determining the maximal, minimal, greatest, and least elements of subsets. Thus we learn to work with orderings and compute with them.
Maxima and minima
Whenever an ordering is presented, one will be interested in finding maximal and minimal elements of subsets. Of course, we do not think of linear orderings only. It is a pity that many people – even educated people – colloquially identify the maximal element with the greatest. What makes this even worse is that the two concepts often indeed coincide. However, the definitions and meanings become different in nature as soon as orderings are not just finite and linear.