To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Part I starts by recalling more or less trivial facts on sets, their elements or subsets, and relations between them. It is rather sketchy and will probably be uninteresting for literate scientists such as mathematicians and/or logicians.
Sets, elements, subsets, and relations can be represented in different ways. We will give hints to the programmer how to work with concrete relations and put particular emphasis on methods for switching from one form to another. Such transitions may be achieved on the representation level; they may, however, also touch a higher relation-algebraic level which we hide at this early stage.
We are going to recall how a partition is presented, or a permutation. Permutations may lead to a different presentation of a set or of a relation on or between sets. Functions between sets may be given in various forms, as a table, as a list, or in some other form. A partition may reduce the problem size when factorizing according to it. We show how relations emerge. This may be simply by writing down a matrix, or by abstracting with a cut from a real-valued matrix. For testing purposes, the relation may be generated randomly.
There is a clash in attitudes and expectations between mathematicians and information engineers. While the first group is interested in reasoning about properties, the second aims at computing and evaluating around these properties and thus tries to have the relations in question in their hands.
Typechecking is a static analysis on programs that enables the detection of potential dynamic type errors. Since it is already a standard technique in programming languages and since XML's schemas are analogous to data types in programming languages, we might expect existing typechecking techniques to be usable also for the verification of XML processing programs. However, a concrete analysis algorithm must be renewed entirely since schemas have a mathematical structure completely different from conventional data types. In this chapter, we first overview various approaches to XML typechecking and then consider, as a case study, a core of the XDuce type system in detail.
Compact taxonomy
Since the aim of typechecking is the rejection of possibly incorrect programs, it is a language feature clearly visible to the user. Therefore, in constructing a typechecker we must take into account not only the internal analysis algorithm but also the external specification, which further influences the design of the whole language.
The first design consideration is the kind of type error that should be detected. That is, what sort of type error should be raised from a program in the target language? The most typical kind of error occurs when the final result does not conform to the specified “output type.” However, some languages also check intermediate results against types. For example, the XDuce language has a pattern match facility, according to which a match error occurs when a form of conformance test on an intermediate result fails (Section 5.1).
Computer science, like other mathematical fields, cannot live without a tight relationship with reality. However, such a relationship is, frankly, not very common. This is probably why people so enthusiastically welcome a true meeting of theory and practice. In that sense, the coming together of XML and tree automata theory was a beautiful marriage. Thus I have written this book in the earnest hope that the news of this marriage will be spread and celebrated all over the world!
The book is a summary of my ten years' work. It could not have been realized without my collaborators, Peter Buneman, Giuseppe Castagna, Alain Frisch, Vladimir Gapeyev, Kazuhiro Inaba, Shinya Kawanaka, Hiromasa Kido, Michael Y. Levin, Sebastian Maneth, Makoto Murata, Benjamin C. Pierce, Tadahiro Suda, Jérôme Vouillon, Takeshi Yashiro, and Philip Wadler. In particular, I thank Kazuhiro Inaba and Sebastian Maneth, who made uncountable comments on the draft and thus contributed to a huge improvement of the book. Lastly, I thank my dear wife, Ayako, who gave me the warmest and unceasing encouragement to finish the book.
This chapter presents efficient algorithms for several important problems related to XML processing, namely: (1) membership for tree automata (to test whether a given tree is accepted by a given tree automaton); (2) evaluation of marking tree automata (to collect the set of bindings yielded by the matching of a given tree against a given marking tree automaton); and (3) containment for tree automata (to test whether the languages of given two tree automata are in subset relation). For these problems we describe several “on-the-fly” algorithms, in which only a part of the whole state space is explored to obtain the final result. In such algorithms there are two basic approaches, top-down and bottom-up. The top-down approach explores the state space from the initial states whereas the bottom-up approach does this from the final states. In general, the bottom-up approach tends to have a lower complexity in the worst case whereas the top-down often gives a higher efficiency in practical cases. We will also consider a further improvement from combining these ideas.
Membership algorithms
In this section, we will consider three algorithms for testing membership. The first is a top-down algorithm, which can be obtained rather simply from the semantics of tree automata but takes time that is exponential in the size of the input tree, in the worst case.
There exist many other areas of possibly fruitful application of relational mathematics from which we select three topics – and thus omit many others. In the first section, we will again study the lifting into the powerset. Credit is mainly due to theoretical computer scientists (e.g. [22, 39]) who have used and propagated such constructs as an existential image or a power transpose. When relations (and not just mappings as for homomorphisms) are employed to compare state transitions, we will correspondingly need relators as a substitute for functors. We will introduce the power relator. While rules concerning power operations have often been only postulated, we go further here and deduce such rules in our general axiomatic setting.
In Section 19.2, we treat questions of simulation using relational means. When trying to compare the actions of two black box state transition systems, one uses the concept of a bisimulation to model the idea of behavioral equivalence. In Section 19.3, a glimpse of state transition systems and system dynamics is offered.
The key concept is that of an orbit, the sequence of sets of states that result when continuously executing transitions of the system starting from some given set of states. There exist many orbits in a delicately related relative position. Here the application of relations seems particularly promising.
Tree transducers form a class of finite-state models that not only accept trees but also transform them. While in Chapter 7 we described a simple yet very powerful tree transformation language, μXDuce, tree transducers are more restricted and work with only a finite number of states. An important aspect of the restriction is that tree transducers can never use intermediate data structure. However, this restriction is not too unrealistic, as it is one of the design principles of the XSLT language – currently the most popular language for XML transformation. In addition, this restriction leads to several nice properties that otherwise would hardly hold; among the most important is the exact typechecking property to be detailed in Chapter 11. The present chapter introduces two standard tree transducer frameworks, namely, top-down tree transducers and macro tree transducers, and shows their basic properties.
Top-down tree transducers
Top-down tree transducers involve a form of transformation that traverses a given tree from the root to the leaves while at each node producing a fragment of the output tree determined by the label of the current node and the current state. Since a state of a tree transducer can then be seen as a rule from a label to a tree fragment, we call a state a procedure from now on.
When considering the tree transducer family, we often take a nondeterministic approach, as in the case of automata.
A comparison may help to describe the intention of this book: natural sciences and engineering sciences have their differential and integral calculi. Whenever practical work is to be done, one will easily find a numerical algebra package at the computing center which one will be able to use. This applies to solving linear equations or determining eigenvalues, for example, in connection with finite element methods.
The situation is different for various forms of information sciences as in the study of vagueness, fuzziness, spatial or temporal reasoning, handling of uncertain/rough/ qualitative knowledge in mathematical psychology, sociology, and computational linguistics, to mention a few areas. These also model theoretically with certain calculi, the calculi of logic, of sets, the calculus of relations, etc. However, for applications practitioners will usually apply Prolog-like calculi. Hardly anybody confronted with practical problems knows how to apply relational calculi; there is almost no broadly available computer support. There is usually no package able to handle problems beyond toy size. One will have to approach theoreticians since there are not many practitioners in such fields. So it might seem that George Boole in 1854 [26, 28] was right in saying:
It would, perhaps, be premature to speculate here upon the question whether the methods of abstract science are likely at any future day to render service in the investigation of social problems at all commensurate with those which they have rendered in various departments of physical inquiry.
The data format known as extensible mark-up language (XML) describes tree structures based on mark-up texts. The tree structures are formed by inserting, between text fragments, open and end tags that are balanced, like parentheses. A data set thus obtained is often called a document. On the surface, XML resembles hypertext mark-up language (HTML), the most popular display format for the Web. The essential difference, however, is that in XML the structure permitted to documents, including the set of tag names and their usage conventions, is not fixed a priori.
More precisely, XML allows users to define their own schemas; a schema determines the permitted structure of a document. In this sense, it is often said that a schema defines a “subset of XML” and thus XML is a “format for data formats.” With the support of schemas each individual application can define its own data format, while virtually all applications can share generic software tools for manipulating XML documents. This genericity is a prominent strength of XML in comparison with other existing formats. Indeed, XML has been adopted with unprecedented speed and range: an enormous number of XML schemas have been defined and used in practice. To raise a few examples, extensible HTML (XHTML) is the XML version of HTML, simple object access protocol (SOAP) is an XML message format for remote procedure calls, scalable vector graphics (SVG) is a vector graphics format in XML, and MathML is an XML format for mathematical formulas.
Many of the problems handled in applications are traditionally formulated in terms of graphs. This means that graphs will often be drawn and the reasoning will be pictorial. On the one hand, this is nice and intuitive when executed with chalk on a blackboard. On the other hand there is a considerable gap from this point to treating the problem on a computer. Often ad hoc programs are written in which more time is spent on I/O handling than on precision of the algorithm. Graphs are well suited to visualization of a result, even with the possibility of generating the graph via a graph drawing program. What is nearly impossible is the input of a problem given by means of a graph – when not using RELVIEW'S interactive graph input (see [18], for example.). In such cases some sort of relational interpretation of the respective graph is usually generated and input in some way.
We will treat reducibility and irreducibility first, mentioning also partial decomposability. Then difunctional relations are studied in the homogeneous context which provides additional results. The main aim of this chapter is to provide algorithms to determine relationally specified subsets of a graph or relation in a declarative way.
Reducibility and irreducibility
We are now going to study in more detail the reducibility and irreducibility introduced in a phenomenological form as Def. 6.12. Many of these results for relations stem from Georg Frobenius [54] and his study of eigenvalues of non-negative realvalued matrices; a comprehensive presentation is given in [92].
The work in this chapter – although stemming from various application fields – is characterized by two antitone mappings leading in opposite directions that cooperate in a certain way. In most cases they are related to one or more relations which are often heterogeneous. An iteration leads to a fixed point of a Galois correspondence. Important classes of applications lead to these investigations. Trying to find out where a program terminates, and thus correctness considerations, also invoke such iterations. Looking for the solution of games is accompanied by these iterations. Applying the Hungarian alternating chain method to find maximum matchings or to solve assignment problems subsumes to these iterations. All this is done in structurally the same way, and deserves to be studied separately.
Galois iteration
When Evariste Galois wrote down his last notes, in preparation for the duel in 1832, in which he expected to die, he probably could not have imagined to what extent these notes would later influence mathematics and applications. What he had observed may basically be presented with the correspondence of permutations of a set and their fixed points. Consider the 5-element sequence {1, 2, 3, 4, 5} for which there exist in total 5! = 120 permutations. The idea is now to observe which set of permutations leaves which set of elements fixed. Demanding more elements to be untouched by a permutation results, of course, in fewer permutations.
Usually, we are confronted with sets at a very early period of our education. Depending on the respective nationality, it is approximately at the age of 10 or 11 years. Thus we carry with us quite a burden of concepts concerning sets. At least in Germany, Mengenlehre as taught in elementary schools will raise bad memories on discussing it with parents of school children. All too often, one will be reminded of Georg Cantor, the inventor of set theory, who became mentally ill. At a more advanced level, we encounter a number of paradoxes making set theory problematic, when treated in a naïve way. One has to avoid colloquial formulations completely and should confine oneself to an adequately restricted formal treatment.
The situation does not improve when addressing logicians. Most of them think in just one universe of discourse containing numbers, letters, pairs of numbers, etc., altogether rendering themselves susceptible to numerous semantic problems. While these, in principle, can be overcome, ideally they should nevertheless be avoided from the beginning.
In our work with relations, we will mostly be restricted to finite situations, which are much easier to work with and to which most practical work is necessarily confined. A basic decision for this text is that a (finite) set is always introduced together with a linear ordering of its elements. Only then we will have a well-defined way of presenting a relation as a Boolean matrix.