To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The book is intended for lectures on string processes and pattern matching in Master's courses of computer science and software engineering curricula. The details of algorithms are given with correctness proofs and complexity analysis, which make them ready to implement. Algorithms are described in a C-like language. The book is also a reference for students in computational linguistics or computational biology. It presents examples of questions related to the automatic processing of natural language, to the analysis of molecular sequences, and to the management of textual databases.
In the beginning, we de.ne automata as labelled graphs. This point of view enables a simple presentation of the basic properties of the languages recognised by .nite automata – called recognisable languages – and leads naturally to the successive generalisations which will be the subject of subsequent chapters.
We will then consider the family of recognisable languages as the result of a direct construction on the algebra of languages, that is, the set of subsets of a free monoid equipped with three operations, called rational operations. This is the substance of Kleene's Theorem, and the source of some of the properties of this family of languages which from now on we shall also call rational.
Next, we will return to automata but with a functional point of view which is more suitable for the modelling of a calculating machine. This leads directly to the notion of a deterministic automaton and on to that of a minimal automaton.
The fourth section is an introduction to the theory of rational expressions, which provides an axiomatic perspective on the construction of rational languages. This theory is only sketched, but its definitions allow us to compare different modes of calculation and to introduce the idea of the derivation of an expression, which is another means of linking rational languages and automata.
As its number indicates, this chapter was not written to be read. Here will be found reminders of more or less standard notions and structures which are used in this book, with their notation. The intention is that the reader should come here when the need arises.
This reader will however note that Sections 3 and 4 deal with words, a notion which it would be dishonest to brand ‘standard’ in the same sense as, for example, set union or the product of two matrices. In fact, and unlike the usual mathematical point of view which deals with numbers – a measure of continuous magnitudes – and with the related notions of functions and functionals, computer science, or at least that part which abstracts from the physical realisation of computers and concentrates on the problems of information processing, deals with sequences, usually finite, of symbols. These are words. This most general notion is not intended to conceal a vacuous concept; on the contrary, the variety of situations that it encompasses gives it its richness. It is well worth some definitions, and, for neophytes, the corresponding sections merit a detour.
Sections 7 and 8 are also rather non-standard outside the computer science community. Section 7 recalls some basic definitions of graph theory. It is not a preliminary to automata theory (all of whose definitions are given in Chapter I); rather, it allows us to refer, in some results on automata, to corresponding graph-theoretic results. Section 8 tackles more fundamental subjects, sketching the two notions of the complexity of a procedure and of the decidability of a problem.
For a directed graph G without loops or parallel edges, let β(G) denote the size of the smallest feedback arc set, i.e., the smallest subset X ⊂ E(G) such that G ∖ X has no directed cycles. Let γ(G) be the number of unordered pairs of vertices of G which are not adjacent. We prove that every directed graph whose shortest directed cycle has length at least r ≥ 4 satisfies β(G) ≤ cγ(G)/r2, where c is an absolute constant. This is tight up to the constant factor and extends a result of Chudnovsky, Seymour and Sullivan.
This result can also be used to answer a question of Yuster concerning almost given length cycles in digraphs. We show that for any fixed 0 < θ < 1/2 and sufficiently large n, if G is a digraph with n vertices and β(G) ≥ θn2, then for any 0 ≤ m ≤ θn − o(n) it contains a directed cycle whose length is between m and m + 6θ−1/2. Moreover, there is a constant C such that either G contains directed cycles of every length between C and θn − o(n) or it is close to a digraph G′ with a simple structure: every strong component of G′ is periodic. These results are also tight up to the constant factors.
Division? you say to yourself, Do you not rather mean the Pascaline, the adding machine which the young Blaise built to relieve his father from tiresome calculations and which once and for all set France in the firmament of computerbuilding nations?
— No indeed, I assure you, it is of division that I want to speak to you; but your surprise is not misplaced, and Pascal himself would be intrigued that we speak of a machine.
We can read however in his complete works an original article2 in which the mathematician–philosopher analyses the mechanism of division. Let us give him the floor:
Nihil tritius est apud arithmeticos quam…
On second thoughts, let us turn instead to his translator:
Nothing in arithmetic is better known than the proposition according to which any multiple of 9 is composed of digits whose sum is itself a multiple of 9. […] In this little treatise […], I shall also set out a general method which allows one to discover, by simple inspection of its digits, whether a number is divisible by an arbitrary other number; this method applies not only to our decimal system of numeration (which system rests on a convention, an unhappy one besides, and not on a natural necessity, as the vulgar think), but it also applies without fail to every system of numeration having for base whatever number one wishes, as may be discovered in the following pages.
For a long time I would go through, head whirling, the writing of this preface. I would rattle off. whole sentences to give myself the heart to work. Now that I have to do it for real, I understand that the task is no easier than the heart of the work. How to justify writing a book on automata theory? Another one! and so thick! Justify? One can always dream; present, perhaps.
A shining light of computer science research in the nineteen-sixties, a compulsory part of instruction in the discipline in the seventies and eighties, automata theory seems to have disappeared from lecture theatre and conference hall. Nonetheless, we find it, explicitly or implicitly, in the essence or the premises of a number of subjects in computer science which are currently new or fashionable. As a possible explanation, I suggest that automata theory is the linear algebra of computer science. I mean this in two ways. Properly speaking, automata theory is non-commutative linear algebra, or can be viewed as such: the theory of matrices with coefficients in suitable algebras. I am more interested, however, in the figurative sense: automata theory as a basic, fundamental subject, known and used by everyone, which has formed part of the intellectual landscape for so long that it is no longer noticed. And yet, there it is, structuring it, organising it: and knowing it allows us to orient ourselves.
A well known result of Fraenkel and Simpson states that the number of distinct squares in a word of length n is bounded by 2n since at each position there are at most two distinct squares whose last occurrence starts. In this paper, we investigate squares in partial words with one hole,or sequences over a finite alphabet that have a “do not know” symbol or “hole”. A square in a partial word over a given alphabet has the form uv where u is compatible with v, and consequently, such square is compatible with a number of words over the alphabet that are squares. Recently, it was shown that for partial words with one hole, there may be more than two squares that have their last occurrence starting at the same position. Here, we prove that if such is the case, then the length of the shortest square is at most half the length of the third shortest square.As a result, we show that the number of distinct squares compatible with factors of a partial word with one hole of length n is bounded by $\frac{7n}{2}$.
In this chapter we undertake the study of relations realised by finite automata. It will be followed in the next chapter by the particularly fruitful special case of functional relations, or functions, which are also realised by finite automata. To this end, we shall be led to use several of the notions and results worked out in the previous two chapters, about automata over the direct products of free monoids (which are not free monoids) and about automata over free monoids but with weights in semirings of suitable coefficients.
This chapter also reproduces, in brief, the structure of the preceding three chapters. In the first section we shall begin by taking a ‘set’ point of view of the free monoid: automata relate words to words. We will thus construct a theory which has its roots in the origins of automata theory, and which has been more or less elaborated in many works.1 Its main aim has been the classification of families of non-rational languages, principally of sub-families of algebraic languages, by means of relations between words thus defined. This aspect will not be tackled here, and we shall stick to the study of the properties of these relations for their own sake.
In the second and third sections we shall continue this study in the general context of weighted automata and series. We shall first consider the problems inherent in the definition of (additive) maps which relate series to series, then study those which are recognised by finite automata. The subject is less often presented in this form.
As this English edition is, or was intended to be, the direct translation of the French one, although already few years old, there is not much to say in this preface. Nothing but to express my gratitude to all those who have made the volume possible and helped in its realisation.
First, to David Tranah from Cambridge University Press, whom I first met more than twenty years ago and who showed interest in the book on automata I dreamt of writing. He was encouraging when I eventually engaged in that project and supportive when I was in the throes of completing the French edition. He was quick to welcome its publication by Cambridge University Press, and of infinite patience when waiting for the final version of the manuscript.
Under the seal of secrecy, I am glad to confess that I am grateful to Wolfgang Thomas who, in a highly confidential review, warmly recommended that CUP have the French edition translated and published.
I was very lucky, thanks to James Martin, to meet Reuben Thomas, who agreed to translate the book into English. The reader will appreciate, probably even better than I, the fluidity of his English. Even more striking, and I am still the best witness to that, was his talent and eagerness not only to translate the words but also to convey the style in which I had written them.
As announced in the last chapter, this chapter deals with functional relations, or functions, which can be realised by finite automata, and which, by habit and for convenience, and despite some confusion with other areas of mathematics, we shall call rational functions. The assumption of functionality combined with that of rationality will give remarkable structural results.
We start by proving that the fact of being functional is a decidable property for a rational relation. Then we define some families of functions that we will use in what follows, namely sequential (and co-sequential) functions, which are realised by transducers which, roughly, are to arbitrary functional transducers what deterministic (and co-deterministic) automata are to arbitrary automata.
The second section deals with the uniformisation of rational relations by rational functions. We deduce this result from the construction of a Schützenberger covering of an automaton, and this proof method will permit us to use it as the basis of all succeeding developments. In particular, we deduce simply that every every rational function has a semi-monomial matrix representation, which gives the principal structural result, namely that every rational function is the product of a sequential function and a co-sequential function, and it is this which allows us to say that rational functions ‘are simple’.
These results continue into the following section; first with the notion of the crosssection, which is dual to that of uniformisation, a reversal of the point of view which turns out to be fruitful. We continue with a closer and more technical study of the way in which uniformisations and cross-sections can be constructed.
We show that Dejean's conjectureholds for n ≥ 27. This brings the final resolution of the conjecture by the approach of Moulin Ollagnier within range of the computationally feasible.
We describe a technique that maps unranked trees to arbitrary hash codes using a bottom-up deterministic tree automaton (DTA). In contrast to other hashing techniques based on automata, our procedure builds a pseudo-minimal DTA for this purpose. A pseudo-minimal automaton may be larger than the minimal one accepting the same language but, in turn, it contains proper elements (states or transitions which are unique) for every input accepted by the automaton. Therefore, pseudo-minimal DTA are a suitable structure to implement stable hashing schemes, that is, schemes where the output for every key can be determined prior to the automaton construction. We provide incremental procedures to build the pseudo-minimal DTA and the mapping that associates an integer value to every transition that will be used to compute the hash codes. This incremental construction allows for the incorporation of new trees and their hash codes without the need to rebuild the whole DTA from scratch.