To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The prediction problems studied in previous chapters have been often represented as repeated games between a forecaster and the environment. Our use of a game-theoretic formalism is not accidental: there exists an intimate connection between sequential prediction and some fundamental problems belonging to the theory of learning in games. We devote this chapter to the exploration of some of these connections.
Rather than giving an exhaustive account of the area of learning in games, we only focus on “regret-based” learning procedures (i.e., situations in which the players of the game base their strategies only on regrets they have suffered in the past) and our fundamental concern is whether such procedures lead to equilibria. We also limit our attention to finite strategic or normal form games.
In this introductory section we present the basic definitions of the games we consider describe some notions of equilibria, and introduce the model of playing repeated games that we investigate in the subsequent sections of this chapter.
K-Person Normal Form Games
A (finite) K-person game given in its strategic (or normal) form is defined as follows. Player k(k = 1, …, K) has Nk possible actions (or pure strategies) to choose from, where Nk is a positive integer.
… beware of mathematicians, and all those who make empty prophecies.
St. Augustine, De Genesi ad Litteram libri duodecim. Liber Secundus, 17, 37.
Prediction of individual sequences, the main theme of this book, has been studied in various fields, such as statistical decision theory, information theory, game theory, machine learning, and mathematical finance. Early appearances of the problem go back as far as the 1950s, with the pioneering work of Blackwell, Hannan, and others. Even though the focus of investigation varied across these fields, some of the main principles have been discovered independently. Evolution of ideas remained parallel for quite some time. As each community developed its own vocabulary, communication became difficult. By the mid-1990s, however, it became clear that researchers of the different fields had a lot to teach each other.
When we decided to write this book, in 2001, one of our main purposes was to investigate these connections and help ideas circulate more fluently. In retrospect, we now realize that the interplay among these many fields is far richer than we suspected. For this reason, exploring this beautiful subject during the preparation of the book became a most exciting experience – we really hope to succeed in transmitting this excitement to the reader. Today, several hundreds of pages later, we still feel there remains a lot to discover. This book just shows the first steps of some largely unexplored paths. We invite the reader to join us in finding out where these paths lead and where they connect.
Prediction, as we understand it in this book, is concerned with guessing the short-term evolution of certain phenomena. Examples of prediction problems are forecasting tomorrow's temperature at a given location or guessing which asset will achieve the best performance over the next month. Despite their different nature, these tasks look similar at an abstract level: one must predict the next element of an unknown sequence given some knowledge about the past elements and possibly other available information. In this book we develop a formal theory of this general prediction problem. To properly address the diversity of potential applications without sacrificing mathematical rigor, the theory will be able to accommodate different formalizations of the entities involved in a forecasting task, such as the elements forming the sequence, the criterion used to measure the quality of a forecast, the protocol specifying how the predictor receives feedback about the sequence, and any possible side information provided to the predictor.
In the most basic version of the sequential prediction problem, the predictor – or forecaster – observes one after another the elements of a sequence y1, y2,… of symbols. At each time t = 1, 2,…, before the tth symbol of the sequence is revealed, the forecaster guesses its value yt on the basis of the previous t – 1 observations.
In the classical statistical theory of sequential prediction, the sequence of elements, which we call outcomes, is assumed to be a realization of a stationary stochastic process.
This chapter investigates several variants of the randomized prediction problem. These variants are more difficult than the basic version treated in Chapter 4 in that the forecaster has only limited information about the past outcomes of the sequence to be predicted. In particular, after making a prediction, the true outcome yt is not necessarily revealed to the forecaster, and a whole range of different problems can be defined depending on the type of information the forecaster has access to.
One of the main messages of this chapter is that Hannan consistency may be achieved under significantly more restricted circumstances, a surprising fact in some of the cases described later. The price paid for not having full information about the outcomes is reflected in the deterioration of the rate at which the per-round regret approaches 0.
In the first variant, investigated in Sections 6.2 and 6.3, only a small fraction of the outcomes is made available to the forecaster. Surprisingly, even in this “label efficient” version of the prediction game, Hannan consistency may be achieved under the only assumption that the number of outcomes revealed after n prediction rounds grows faster than log(n)log log(n).
Section 6.4 formulates prediction problems with limited information in a general framework. In the setup of prediction under partial monitoring, the forecaster, instead of his own loss, only receives a feedback signal. The difficulty of the problem depends on the relationship between losses and feedbacks.
Fundamental notions of combinatorics on words underlie natural language processing. This is not surprising, since combinatorics on words can be seen as the formal study of sets of strings, and sets of strings are fundamental objects in language processing.
Indeed, language processing is obviously a matter of strings. A text or a discourse is a sequence of sentences; a sentence is a sequence of words; a word is a sequence of letters. The most universal levels are those of sentence, word, and letter (or phoneme), but intermediate levels exist, and can be crucial in some languages, between word and letter: a level of morphological elements (e.g. suffixes), and the level of syllables. The discovery of this piling up of levels, and in particular of word level and phoneme level, delighted structuralist linguists in the twentieth century. They termed this inherent, universal feature of human language “double articulation”.
It is a little more intricate to see how sets of strings are involved. There are two main reasons. First, at a point in a linguistic flow of data being processed, you must be able to predict the set of possible continuations after what is already known, or at least to expect any continuation among some set of strings that depends on the language. Second, natural languages are ambiguous, that is a written or spoken portion of text can often be understood or analysed in several ways, and the analyses are handled as a set of strings as long as they cannot be reduced to a single analysis.
The chapter presents data structures used to memorize the suffixes of a text and some of their applications. These structures are designed to give a fast access to all factors of the text, and this is the reason why they have a fairly large number of applications in text processing.
Two types of objects are considered in this chapter, digital trees and automata, together with their compact versions. Trees put together common prefixes of the words in the set. Automata gather in addition their common suffixes. The structures are presented in order of decreasing size.
The representation of all the suffixes of a word by an ordinary digital tree called a suffix trie (Section 2.1) has the advantage of being simple but can lead to a memory size that is quadratic in the length of the considered word. The compact tree of suffixes (Section 2.2) is guaranteed to hold in linear memory space.
The minimization (related to automata) of the suffix trie gives the minimal automaton accepting the suffixes and is described in Section 2.4. Compaction and minimization yield the compact suffix automaton of Section 2.5.
Most algorithms that build the structures presented in this chapter work in time O(n × log Card A), for a text of length n, assuming that there is an ordering on the alphabet A. Their execution time is thus linear when the alphabet is finite and fixed. Locating a word of length m in the text then takes O(m × log Card A) time.
This chapter is an introductory chapter to the book. It gives general notions, notation, and technical background. It covers, in a tutorial style, the main notions in use in algorithms on words. In this sense, it is a comprehensive exposition of basic elements concerning algorithms on words, automata and transducers, and probability on words.
The general goal of “stringology” we pursue here is to manipulate strings of symbols, to compare them, to count them, to check some properties, and perform simple transformations in an effective and efficient way.
A typical illustrative example of our approach is the action of circular permutations on words, because several of the aspects we mentioned above are present in this example. First, the operation of circular shift is a transduction which can be realized by a transducer. We include in this chapter a section (Section 1.5) on transducers. Transducers will be used in Chapter 3. The orbits of the transformation induced by the circular permutation are the so-called conjugacy classes. Conjugacy classes are a basic notion in combinatorics on words. The minimal element in a conjugacy class is a good representative of a class. It can be computed by an efficient algorithm (actually in linear time). This is one of the algorithms which appear in Section 1.2. Algorithms for conjugacy are again considered in Chapter 2. These words give rise to Lyndon words which have remarkable combinatorial properties already emphasized in Lothaire (1997). We describe in Section 1.2.5 the Lyndon factorization algorithm.
A series of important applications of combinatorics on words has emerged with the development of computerized text and string processing, especially in biology and in linguistics. The aim of this volume is to present, in a unified treatment, some of the major fields of applications. The main topics that are covered in this book are
Algorithms for manipulating text, such as string searching, pattern matching, and testing a word for special properties.
Efficient data structures for retrieving information on large indexes, including suffix trees and suffix automata.
Combinatorial, probabilistic, and statistical properties of patterns in finite words, and more general pattern, under various assumptions on the sources of the text.
Inference of regular expressions.
Algorithms for repetitions in strings, such as maximal run or tandem repeats.
Linguistic text processing, especially analysis of the syntactic and semantic structure of natural language. Applications to language processing with large dictionaries.
Enumeration, generation, and sampling of complex combinatorial structures by their encodings in words.
This book is actually the third of a series of books on combinatorics on words. Lothaire's “Combinatorics on Words” appeared in its first printing in 1984 as Volume 17 of the Encyclopedia of Mathematics. It was based on the impulse of M. P. Schützenberger's scientific work. Since then, the theory developed to a large scientific domain. It was reprinted in 1997 in the Cambridge Mathematical Library.
Repeated patterns and related phenomena in words are known to play a central role in many facets of computer science, telecommunications, coding, data compression, and molecular biology. One of the most fundamental questions arising in such studies is the frequency of pattern occurrences in another string known as the text. Applications of these results include gene finding in biology, code synchronization, user search in wireless communications, detecting signatures of an attacker in intrusion detection, and discovering repeated strings in the Lempel-Ziv schemes and other data compression algorithms.
In basic pattern matching one finds for a given (or random) pattern w or a set of patterns W and text X how many times W occurs in the text and how long it takes for W to occur in X for the first time. These two problems are not unrelated as we have already seen in Chapter 6. Throughout this chapter we allow patterns to overlap and we count overlapping occurrences separately. For example, w = abab occurs three times in the text = bababababb.
We consider pattern matching problems in a probabilistic framework in which the text is generated by a probabilistic source while the pattern is given. In Chapter 1 various probabilistic sources were discussed. Here we succinctly summarize assumptions adopted in this chapter. In addition, we introduce a new general source known as a dynamical source recently proposed by Vallée. In Chapter 2 algorithmic aspects of pattern matching and various efficient algorithms for finding patterns were discussed.
The application of statistical methods to natural language processing has been remarkably successful over the past two decades. The wide availability of text and speech corpora has played a critical role in their success since, as for all learning techniques, these methods rely heavily on data. Many of the components of complex natural language processing systems, for example, text normalizers, morphological or phonological analyzers, part-of-speech taggers, grammars or language models, pronunciation models, context-dependency models, acoustic Hidden-Markov Models (HMMs), are statistical models derived from large data sets using modern learning techniques. These models are often given as weighted automata or weighted finite-state transducers either directly or as a result of the approximation of more complex models.
Weighted automata and transducers are the finite automata and finite-state transducers described in Chapter 1 Section 1.5 with the addition of some weight to each transition. Thus, weighted finite-state transducers are automata in which each transition, in addition to its usual input label, is augmented with an output label from a possibly different alphabet, and carries some weight. The weights may correspond to probabilities or log-likelihoods or they may be some other costs used to rank alternatives. More generally, as we shall see in the next section, they are elements of a semiring set. Transducers can be used to define a mapping between two different types of information sources, for example, word and phoneme sequences.
This chapter shows some examples of applications of combinatorics on words to number theory with a brief incursion into physics. These examples have a common feature: the notion of morphism of the free monoid. Such morphisms have been widely studied in combinatorics on words; they generate infinite words which can be considered as highly ordered, and which occur in an ubiquitous way in mathematics, theoretical computer science, and theoretical physics.
The first part of this chapter is devoted to the notion of automatic sequences and uniform morphisms, in connection with the transcendence of formal power series with coefficients in a finite field. Namely it is possible to characterize algebraicity of these series in a simple way: a formal power series is algebraic if and only if the sequence of its coefficients is automatic, that is, if it is the image by a letter-to-letter map of a fixed point of a uniform morphism. This criterion is known as Christol's theorem. A central tool in the study of automatic sequences is the notion of kernel of an infinite word (sequence) over a finite alphabet: this is the set of subsequences obtained by certain decimations. A rephrasing of Christol's theorem is that transcendence of a formal power series over a finite field is equivalent to infiniteness of the kernel of the sequence of its coefficients: this will be illustrated in this chapter.