To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Certain convergent search algorithms can be turned into chaotic dynamic systems by renormalisation back to a standard region at each iteration. This allows the machinery of ergodic theory to be used for a new probabilistic analysis of their behaviour. Rates of convergence can be redefined in terms of various entropies and ergodic characteristics (Kolmogorov and Rényi entropies and Lyapunov exponent). A special class of line-search algorithms, which contains the Golden-Section algorithm, is studied in detail. Their associated dynamic systems exhibit a Markov partition property, from which invariant measures and ergodic characteristics can be computed. A case is made that the Rényi entropy is the most appropriate convergence criterion in this environment.
This is the first book devoted to broad study of the combinatorics of words, that is to say, of sequences of symbols called letters. This subject is in fact very ancient and has cropped up repeatedly in a wide variety of contexts. Even in the most elegant parts of abstract pure mathematics, the proof of a beautiful theorem surprisingly often reduces to some very down to earth combinatorial lemma concerning linear arrays of symbols. In applied mathematics, that is in the subjects to which mathematics can be applied, such problems are even more to be expected. This is true especially in those areas of contemporary applied mathematics that deal with the discrete and non-commutative aspects of the world about us, notably the theory of automata, information theory, and formal linguistics.
The systematic study of words seems to have been initiated by Axel Thue in three papers [Norske Vid. Selsk. Skr. I Mat. Nat. Kl. Christiania, 1906, 1–22; 1912, 1–67; 1914, 1–34.]. Even more than for his theorems, we owe him a great debt for delineating this subject. Both before and after his time, a multitude of fragmentary results have accumulated in the most diverse contexts, and a substantial but not very widely known lore was beginning to crystallize to the point where a systematic treatment of the subject was badly needed and long over due.
This need is splendidly fulfilled by the present volume.
The investigation of words includes a series of combinatorial studies with rather surprising conclusions that can be summarized roughly by the following statement: Each sufficiently long word over a finite alphabet behaves locally in a regular fashion. That is to say, an arbitrary word, subject only to the constraint that it be sufficiently long, possesses some regularity. This claim becomes meaningful only if one specifies the kind of regularities that are intended, of course. The discovery and the analysis of these unavoidable regularities constitute a major topic in the combinatorics of words. A typical example is furnished by van der Waerden's theorem.
It should not be concluded that any sufficiently long word is globally regular. On the contrary, the existence of unavoidable regularities leads to the dual question of avoidable regularities: properties not automatically shared by all sufficiently long words. For such a property there exist infinitely many words (finiteness of the alphabet is supposed) that do not satisfy it. The present chapter is devoted mainly to the study of one such property.
A square is a word of the form uu, with u a nonempty word. A word contains a square if one of its factors is a square; otherwise, the word is called square-free. For instance, abcacbacbc contains the square acbacb, and abcacbabcb is square-free. The answer to the question of whether every sufficiently long word contains a square is no, provided the alphabet has at least three letters.
The new printing of Combinatorics on words does not bring many changes. Except for the correction of some misprints and errors, the text has not been modified. I would like to thank those readers who have sent corrections and, in particular, Aldo De Luca, Pavel Goralcik and Bruno Petazzoni.
More than ten years have passed since the first publication of this book. A lot of water has flowed under the bridges of Lotharingia since then.
There is bad news, first. Roger Lyndon, the author of the Foreword of the first edition passed away a few years ago, leaving the memory of a great mathematician and a marvellous man, as did Marcel-Paul Schützenberger this year. He was the spirit behind the scene, and most of the ideas contained in the book were inspired by him. Also, the collective group of authors almost entirely consists of his former students. It is a small tribute to dedicate this book to him.
There is also good news. A new volume on the subject of combinatorics on words is in preparation. It will contain chapters, written by new authors, on topics that had not been included in this volume, making a complementary work, but one which can be read independently. It will cover in particular some aspects of symbolic dynamics, the theory of Young tableaux through the approach of the plactic monoid, combinatorial aspects of free algebras, number systems, and word functions.
This chapter contains the main definitions used in the rest of the book. It also presents some basic results about words that are of constant use in the sequel. In the first section are defined words, free monoids, and some terms about words, such as length and factors.
Section 1.2 is devoted to submonoids and to morphism of free monoids, one of the basic tools for words. Many of the proofs of properties of words involve a substitution from the alphabet into words over another alphabet, which is just the definition of a morphism of free monoids. A nontrivial result called the defect theorem is proved. The theorem asserts that if a relation exists among words in a set, those words can be written on a smaller alphabet. This is a weak counterpart for free monoids of the Nielsen–Schreier theorem for subgroups of a free group.
In Section 1.3 the definition of conjugate words is given, together with some equivalent characterizations. Also defined are primitive words, or words that are not a repetition of another word. A very useful result, due to Fine and Wilf, is proved that concerns the possibility of multiple repetitions. The last section introduces the notation of formal series that deal with linear combinations of words, which will be used in Chapters 5–7 and 11.
A list of problems, some of them difficult, is collected at the end.
This chapter is devoted to the study of a special type of unavoidable regularities. We consider a mapping φ:A+ → E from A+ to a set E, and we search in a word w for factors of the type w1w2 … wn with φ(w1) = φ(w2)= … = φ(wn). The mapping is called repetitive when such a factor appears in each sufficiently long word. This is related both to square-free words (Chapter 2), by considering the identity mapping, and to van der Waerden's theorem (Chapter 3), as will be shown later on.
It will first be shown that any mapping from A+ to a finite set is repetitive (Theorem 4.1.1).
After a direct proof of this fact, it will be shown how the result can also be deduced from Ramsey's theorem (which is stated without proof).
Investigated also is the special case where φ is a morphism from A+ to a semigroup S. First it is proved that a morphism to the semigroup of positive integers is repetitive when the alphabet is finite (Theorem 4.2.1). Then it is proved that a morphism to a finite semigroup is uniformly repetitive, in the sense that the words w1, w2,…, wn/i> in the foregoing definition can be chosen of equal length (Theorem 4.2.2). This is, as will be shown, a generalization of van der Waerden's theorem. Finally, the chapter mentions a number of extensions and other results.
Let us consider two words x, y of the free monoid A*, satisfying the equality:
By Proposition 1.3.2 of Chapter 1, there exist a word u ∈ A* and two integers n, p ≥ 0 such that
In this chapter, we will view x and y as the letters of an alphabet Ξ. We will say that xy = yx is an equation in the unknowns Ξ = {x, y} and that the morphism α: Ξ* → A* defined by α(x) = un and α(y) = up is a solution of the equation. Observe that all solutions of this particular equation are of this type.
The basic notions on equations are presented in Section 9.1. In Section 9.2, we consider a few equations whose families of solutions admit a finite description, as in the preceding example. Indeed, the family of solutions of Eq. (9.0.1) is entirely described by the unique expression (9.0.2), where u runs over all words and n, p over all positive integers. This idea is formalized in Section 9.3, which introduces the notion of parametrizable equations and where it is recalled that all equations in three unknowns are parametrizable.
Not all equations are parametrizable, however. We are thus led in Section 9.4 to define the rank of an equation, which is the maximum number of the letters occurring in the expression of particular solutions called principal.
Let us recall the definition: a word f in A* is a finite sequence of elements of A, called letters. We shall call a subword of a word f any sequence contained in the sequence f. The word aba for instance is a subword of the word bacbcab as well as of the word aabbaa. It can be observed immediately that two sub-sequences of f, distinct as subsequences, may define the same subword: thus aba is a subword of bacbcab in only one way but may be obtained as a subword of aabbaa in eight different ways.
A word f being given it is easy to compute the set of its subwords and their multiplicity; this computation is obtained by a simple induction formula. The main problem of interest in this chapter, sometimes implicitly but more often explicitly, is the one of the inverse correspondence. Under what conditions is a given set of words S the set of subwords, or a subset of certain kind of the set of subwords, of a word f? Once these conditions are met, what are the words f that are thus determined? In which cases are they uniquely determined? Some of these conditions on that set S are rather obvious. For instance if g is a subword of f, then any subword of g is a subword of f.
The aim of this chapter is to give a detailed presentation of the relation between plane trees and special families of words: parenthesis systems and other families. The relation between trees and parenthesis notation is classical and has been known perhaps since Catalan 1838.
Because trees play a central role in the field of combinatorial algorithms (Knuth 1968), their coding by parenthesis notation has been investigated so very often that it is quite impossible to give a complete list of all the papers dealing with the topic. These subjects are also considered in enumeration theory and are known to combinatorialists (Comtet 1970) as being counted by Catalan numbers. Note that a generalization of the type of parenthesis system often called Dyck language is a central concept in formal language theory. These remarks give a good account of the main role played by trees and their coding in combinatorics on words.
Presented here are three ways to represent trees by words. The first one consists in constructing a set of words (one for each node) associated to a plane tree. The second is the classical parenthesis coding, and the third concerns Lukaciewicz language (known also as Polish notation).
The combinatorial properties of Lukaciewicz language were investigated by Raney (1960) in order to give a purely combinatorial proof of the Lagrange inversion formula (see also Schiitzenberger 1971). This proof is presented in Section 11.4 of the present chapter as an application of our combinatorial constructions.
This chapter is devoted to a study of van der Waerden's theorem, which is, according to Khinchin, one of the “pearls of number theory.” This theorem illustrates a principle of unavoidable regularity: It is impossible to produce long sequences of elements taken from a finite set that do not contain subsequences possessing some regularity, in this instance arithmetic progressions of identical elements.
During the last fifty years, van der Waerden's theorem has stimulated a good deal of research on various aspects of the result. Efforts have been made to simplify the proof while at the same time generalizing the theorem, as well as to determine certain numerical constants that occur in the statement of the theorem. This work is of an essentially combinatorial nature. More recently, results from ergodic theory have led to the discovery of new extensions of van der Waerden's theorem, and, as a result, to a topological proof.
The plan of the chapter illustrates this diversity of viewpoints. The first section, after a brief historical note, presents several different formulations of van der Waerden's theorem. The second section gives a combinatorial proof of an elegant generalization due to Grünwald. The third section, which concerns “cadences,” gives an interpretation of the theorem in terms of the free monoid. In the fourth section is presented a topological proof of van der Waerden's theorem, due to Fürstenberg and Weiss.
Combinatorics on words is a field that has grown separately within several branches of mathematics, such as group theory or probabilities, and appears frequently in problems of computer science dealing with automata and formal languages. It may now be considered as an independent theory because of both the number of results that it contains and the variety of possible applications.
This book is the first attempt to present a unified treatment of the theory of combinatorics on words. It covers the main results and methods in an elementary presentation and can be used as a textbook in mathematics or computer science at undergraduate or graduate level. It will also help researchers in these fields by putting together a lot of results scattered in the literature.
The idea of writing this book arose a few years ago among the group of people who have collectively realized it. The starting point was a mimeo-graphed text of lectures given by M. P. Schützenberger at the University of Paris in 1966 and written down by J. F. Perrot. The title of this text was “Quelques Problèmes combinatoires de la théorie des automates.” It was widely circulated and served many people (including most of the authors of this book) as an introduction to this field.