To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Early computers replaced calculators and typewriters, and programmers focused on scientific computing (calculations involving numbers) and string processing (manipulating sequences of alphanumeric characters, or strings). Ironically, in modern applications, string processing is an integral part of scientific computing, as strings are an appropriate model of the natural world in a wide range of applications, notably computational biology and chemistry. Beyond scientific applications, strings are the lingua franca of modern computing, with billions of computers having immediate access to an almost unimaginable number of strings.
Decades of research have met the challenge of developing fundamental algorithms for string processing and mathematical models for strings and string processing that are suitable for scientific studies. Until now, much of this knowledge has been the province of specialists, requiring intimate familiarity with the research literature. The appearance of this new book is therefore a welcome development. It is a unique resource that provides a thorough coverage of the field and serves as a guide to the research literature. It is worthy of serious study by any scientist facing the daunting prospect of making sense of huge numbers of strings.
The development of an understanding of strings and string processing algorithms has paralleled the emergence of the field of analytic combinatorics, under the leadership of the late Philippe Flajolet, to whom this book is dedicated. Analytic combinatorics provides powerful tools that can synthesize and simplify classical derivations and new results in the analysis of strings and string processing algorithms. As disciples of Flajolet and leaders in the field nearly since its inception, Philippe Jacquet and Wojciech Szpankowski are well positioned to provide a cohesive modern treatment, and they have done a masterful job in this volume.
Repeated patterns and related phenomena in words are known to play a central role in many facets of computer science, telecommunications, coding, data compression, data mining, and molecular biology. One of the most fundamental questions arising in such studies is the frequency of pattern occurrences in a given string known as the text. Applications of these results include gene finding in biology, executing and analyzing tree-like protocols for multiaccess systems, discovering repeated strings in Lempel–Ziv schemes and other data compression algorithms, evaluating string complexity and its randomness, synchronization codes, user searching in wireless communications, and detecting the signatures of an attacker in intrusion detection.
The basic pattern matching problem is to find for a given (or random) pattern w or set of patterns W and a text X how many times W occurs in the text X and how long it takes for W to occur in X for the first time. There are many variations of this basic pattern matching setting which is known as exact string matching. In approximate string matching, better known as generalized string matching, certain words from W are expected to occur in the text while other words are forbidden and cannot appear in the text. In some applications, especially in constrained coding and neural data spikes, one puts restrictions on the text (e.g., only text without the patterns 000 and 0000 is permissible), leading to constrained string matching. Finally, in the most general case, patterns from the set W do not need to occur as strings (i.e., consecutively) but rather as subsequences; that leads to subsequence pattern matching, also known as hidden pattern matching.
These various pattern matching problems find a myriad of applications. Molecular biology provides an important source of applications of pattern matching, be it exact or approximate or subsequence pattern matching. There are examples in abundance: finding signals in DNA; finding split genes where exons are interrupted by introns; searching for starting and stopping signals in genes; finding tandem repeats in DNA.
In this chapter we consider generalized pattern matching, in which a set of patterns (rather than a single pattern) is given. We assume here that the pattern is a pair of sets of words (W0, W), where Wi consists of the sets Wi ⊂ Ami (i.e., all words in Wi have a fixed length mi). The set W0 is called the forbidden set. For W0 = ∅ one is interested in the number of pattern occurrences On(W), defined as the number of patterns from W occurring in a text generated by a (random) source. Another parameter of interest is the number of positions in where a pattern from W appears (clearly, several patterns may occur at the same positions but words from Wi must occur in different locations); this quantity we denote as Πn. If we define as the number of positions where a word from Wi occurs, then
Notice that at any given position of the text and for a given i only one word from Wi can occur.
For W0 ≠ ∅ one studies the number of occurrences On(W) under the condition that, that is, there is no occurrence of a pattern from W0 in the text. This could be called constrained pattern matching since one restricts the text to those strings that do not contain strings from W0. A simple version of constrained pattern matching was discussed in Chapter 3 (see also Exercises 3.3, 3.6, and 3.10).
In this chapter we first present an analysis of generalized pattern matching with W0 = ∅ and d = 1, which we call the reduced pattern set (i.e., no pattern is a substring of another pattern).
The discrete Green's function (without boundary) $\mathbb{G}$ is a pseudo-inverse of the combinatorial Laplace operator of a graph G = (V, E). We reveal the intimate connection between Green's function and the theory of exact stopping rules for random walks on graphs. We give an elementary formula for Green's function in terms of state-to-state hitting times of the underlying graph. Namely,$\mathbb{G}(i,j) = \pi_j \bigl( H(\pi,j) - H(i,j) \bigr),$ where πi is the stationary distribution at vertex i, H(i, j) is the expected hitting time for a random walk starting from vertex i to first reach vertex j, and H(π, j) = ∑k∈V πkH(k, j). This formula also holds for the digraph Laplace operator.
The most important characteristics of a stopping rule are its exit frequencies, which are the expected number of exits of a given vertex before the rule halts the walk. We show that Green's function is, in fact, a matrix of exit frequencies plus a rank one matrix. In the undirected case, we derive spectral formulas for Green's function and for some mixing measures arising from stopping rules. Finally, we further explore the exit frequency matrix point of view, and discuss a natural generalization of Green's function for any distribution τ defined on the vertex set of the graph.
We consider large random graphs with prescribed degrees, as generated by the configuration model. In the regime where the empirical degree distribution approaches a limit μ with finite mean, we establish the systematic convergence of a broad class of graph parameters that includes the independence number, the maximum cut size, the logarithm of the Tutte polynomial, and the free energy of the anti-ferromagnetic Ising and Potts models. Contrary to previous works, our results are not a priori limited to the free energy of some prescribed graphical model. They apply more generally to any additive, Lipschitz and concave graph parameter. In addition, the corresponding limits are shown to be Lipschitz and concave in the degree distribution μ. This considerably extends the applicability of the celebrated interpolation method, introduced in the context of spin glasses, and recently related to the challenging question of right-convergence of sparse graphs.
This special issue is devoted to papers from the meeting on Combinatorics and Probability, held at the Mathematisches Forschungsinstitut in Oberwolfach from the 14th to 20th April 2013. The lectures at this meeting focused on the common themes of Combinatorics and Discrete Probability, with many of the problems studied originating in Theoretical Computer Science. The lectures, many of which were given by young participants, stimulated fruitful discussions. The fact that the participants work in different and yet related topics, and the open problems session held during the meeting, encouraged interesting discussions and collaborations.
An important problem in the theory of impartial games is to determine the regularities of their nim-sequences. Subtraction games have periodic nim-sequences and those of octal games are conjectured to be periodic, but the possible regularities of the nim-sequence of a hexadecimal game are unknown. Periodic and arithmetic periodic nim-sequences have been discovered but other patterns also exist. We present an infinite set of hexadecimal games, based on the game 0.2048, that exhibit a regularity—ruler regularity—not yet reported or codified.
A taking-and-breaking game [Albert et al. 2007; Berlekamp et al. 2001] is an impartial combinatorial game, played with heaps of beans on a table. A move for either player consists of choosing a heap, removing a certain number of beans from the heap, and then possibly splitting the remainder into several heaps; the winner is the player making the last move. For example, both Grundy’s Game (choose a heap and split it into two unequal heaps) and Couples-Are-Forever (choose a heap with at least three beans and split it into two) are taking-and-breaking games with very simple rules, however neither has been solved.
We present an overview of the required theory of impartial games. The reader can consult the references above for a more in-depth grounding in the theory of, and for more details about, subtraction and octal games.
The numbers in parentheses are the old numbers used in each of the lists of unsolved problems given on pp. 183–189 of AMS Proc. Sympos. Appl. Math. 43 (1991), called PSAM 43 below; on pp. 475–491 of Games of No Chance, hereafter referred to as GONC; on pp. 457–473 of More Games of No Chance (MGONC); and on pp. 475–500 of Games of No Chance 3 (GONC3). Some numbers have little more than the statement of the problem if there is nothing new to be added. References [year] may be found in Fraenkel’s bibliography at the end of this volume. References [#] are at the end of this article. A useful reference for the rules and an introduction to many of the specific games mentioned below is M. Albert, R. J. Nowakowski and D. Wolfe, Lessons in Play: An Introduction to the Combinatorial Theory of Games, A. K. Peters, 2007 (LIP) or Berlekamp, Conway and Guy, Winning Ways for your Mathematical Plays, vol. 1–4, A. K. Peters, 2000–2004 (WW).
Subtraction games with finite subtraction sets are known to have periodic nim-sequences. Investigate the relationship between the subtraction set and the length and structure of the period. The same question can be asked about partizan subtraction games, in which each player is assigned an individual subtraction set. See Fraenkel and Kotzig [1987].