To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The Twentieth British Combinatorial Conference was organised jointly by the University of Durham and the Open University. It was held at Durham in July 2005. The British Combinatorial Committee had invited nine distinguished combinatorial mathematicians to give survey lectures in areas of their expertise, and this volume contains the survey articles on which these lectures were based.
In compiling this volume I am indebted to the authors for preparing their articles so accurately and in such a timely manner, and to the referees for their prompt replies and their attention to detail while commenting on the articles. I would also like to thank Roger Astley at Cambridge University Press, and Mike Grannell at the Open University for their advice and help. Finally, without the previous efforts of editors of earlier Surveys, my job would have infinitely more difficult!
The British Combinatorial Committee gratefully acknowledges the financial support provided by the London Mathematical Society, the Institute of Combinatorics and its Applications, and from the EPSRC.
Many fundamental combinatorial objects, including balanced incomplete block designs and error-correcting codes, can be constructed and classified via cliques in certain problem-specific graphs. Various such objects are here identified and surveyed, and the utilization of clique algorithms in the construction of these is considered. Occasionally the type of problem admits a formulation as an instance of the exact cover problem, which, for computational reasons, is even more desirable.
Introduction
Cliques and independent sets are two of the most fundamental concepts in graph theory. A clique in a graph G = (V, E) is a subset of vertices V′ ⊆ V that induces a complete graph. (A complete graph is a graph where all vertices are mutually adjacent.) An independent set, on the other hand, is a subset of vertices V′ ⊆ V that induces an empty graph. Obviously, a clique in a graph G is an independent set in the complement graph Ḡ, and vice versa, so without loss of generality one may focus on just one of these concepts. Note that cliques are occasionally defined as complete subgraphs rather than sets; we choose the latter alternative, which is much more convenient.
In the current work we study combinatorial objects that can be viewed as set systems (but when discussing these objects later, they will generally not be treated using the set system formulation). A set system is a collection of subsets of a given set X, S = {S1, S2, …, Sm}, Si ⊆ X, which has some additional specific properties.
We study the Lovász number $\vartheta$ along with two related SDP relaxations $\vartheta_{1/2}$, $\vartheta_2$ of the independence number and the corresponding relaxations $\bar\vartheta$, $\bar\vartheta_{1/2}$, $\bar\vartheta_2$ of the chromatic number on random graphs $G_{n,p}$. We prove that $\vartheta,\vartheta_{1/2},\vartheta_2(G_{n,p})$ are concentrated about their means, and that $\bar\vartheta,\bar\vartheta_{1/2},\bar\vartheta_2(G_{n,p})$ in the case $p<n^{-1/2-\varepsilon}$ are concentrated in intervals of constant length. Moreover, extending a result of Juhász [28], we estimate the probable value of $\vartheta,\vartheta_{1/2},\vartheta_2(G_{n,p})$ for edge probabilities $c_0/n\leq p\leq 1-c_0/n$, where $c_0>0$ is a constant. As an application, we give improved algorithms for approximating the independence number of $G_{n,p}$ and for deciding $k$-colourability in polynomial expected time.
We show that symmetry, represented by a graph's automorphism group, can be used to greatly reduce the computational work for the substitution method. This allows application of the substitution method over larger regions of the problem lattices, resulting in tighter bounds on the percolation threshold $p_c$. We demonstrate the symmetry reduction technique using bond percolation on the $(3,12^2)$ lattice, where we improve the bounds on $p_c$ from (0.738598,0.744900) to (0.739399,0.741757), a reduction of more than 62% in width, from 0.006302 to 0.002358.
Let $G$ be a finite group of order $n$ and let $k$ be a natural number. Let $\{x_i : i\in I\}$ be a family of elements of $G$ such that $|I|= n+k-1$. Let $v$ be the most repeated value of the family. Let $ \{ \sigma_i : 1\leq i \leq k \} $ be a family of permutations of $G$ such that $\sigma_i(1)=1$ for all $i$. We obtain the following result.
There are pairwise distinct elements $i_1, i_2, \dots ,i_k\in I$ such that \[ \prod_{1\leq j\leq k } \sigma_j \big(v^{-1}x_ {i_j }\big) =1.\]
A graph is claw-free if no vertex has three pairwise nonadjacent neighbours. At first sight, there seem to be a great variety of types of claw-free graphs. For instance, there are line graphs, the graph of the icosahedron, complements of triangle-free graphs, and the Schläfli graph (an amazingly highly-symmetric graph with 27 vertices), and more; for instance, if we arrange vertices in a circle, choose some intervals from the circle, and make the vertices in each interval adjacent to each other, the graph we produce is claw-free. There are several other such examples, which we regard as “basic” claw-free graphs.
Nevertheless, it is possible to prove a complete structure theorem for claw-free graphs. We have shown that every connected claw-free graph can be obtained from one of the basic claw-free graphs by simple expansion operations. In this paper we explain the precise statement of the theorem, sketch the proof, and give a few applications.
Introduction
A graph is claw-free if no vertex has three pairwise nonadjacent neighbours. (Graphs in this paper are finite and simple.) Line graphs are claw-free, and it has long been recognized that claw-free graphs are an interesting generalization of line graphs, sharing some of the same properties. For instance, Minty [16] showed in 1980 that there is a polynomial-time algorithm to find a stable set of maximum weight in a claw-free graph, generalizing the algorithm of Edmonds [9, 10] to find a maximum weight matching in a graph.
We study self-avoiding walks (SAWs) on non-Euclidean lattices that correspond to regular tilings of the hyperbolic plane (‘hyperbolic graphs’). We prove that on all but at most eight such graphs, (i) there are exponentially fewer $N$-step self-avoiding polygons than there are $N$-step SAWs, (ii) the number of $N$-step SAWs grows as $\mu_w^N$ within a constant factor, and (iii) the average end-to-end distance of an $N$-step SAW is approximately proportional to $N$. In terms of critical exponents from statistical physics, (ii) says that $\gamma=1$ and (iii) says that $\nu=1$. We also prove that $\gamma$ is finite on all hyperbolic graphs, and we prove a general identity about non-reversing walks that had previously been discovered for certain special cases.
Many classical partitioning problems in combinatorics ask for a single quantity to be maximized or minimized over a set of partitions of a combinatorial object. For instance, Max Cut asks for the largest bipartite subgraph of a graph G, while Min Bisection asks for the minimum size of a cut into two equal pieces.
In judicious partitioning problems, we seek to maximize or minimize a number of quantities simultaneously. For instance, given a graph G with m edges, we can ask for the smallest f(m) such that G must have a bipartition in which each vertex class contains at most f(m) edges.
In this survey, we discuss recent extremal results on a variety of questions concerning judicious partitions, and related problems such as Max Cut.
Introduction
A wide variety of combinatorial optimization problems ask for an “optimal” partition of the vertex set of a graph or hypergraph. A good example is the Max Cut problem: given a graph G, what is the maximum of e(V1, V2) over partitions V(G) = V1 ∪ V2, where e(V1, V2) is the number of edges between V1 and V2? Similarly, Min Bisection asks for the minimum of e(V1, V2) over partitions V(G) = V1 ∪ V2 with |V1| ≤ |V2| ≤ |V1| + 1 (there are k-partite versions Max k-Cut and Min k-Section of both problems).
Both of these problems involve maximizing or minimizing a single quantity over graphs from a certain class.
Fundamental notions of combinatorics on words underlie natural language processing. This is not surprising, since combinatorics on words can be seen as the formal study of sets of strings, and sets of strings are fundamental objects in language processing.
Indeed, language processing is obviously a matter of strings. A text or a discourse is a sequence of sentences; a sentence is a sequence of words; a word is a sequence of letters. The most universal levels are those of sentence, word, and letter (or phoneme), but intermediate levels exist, and can be crucial in some languages, between word and letter: a level of morphological elements (e.g. suffixes), and the level of syllables. The discovery of this piling up of levels, and in particular of word level and phoneme level, delighted structuralist linguists in the twentieth century. They termed this inherent, universal feature of human language “double articulation”.
It is a little more intricate to see how sets of strings are involved. There are two main reasons. First, at a point in a linguistic flow of data being processed, you must be able to predict the set of possible continuations after what is already known, or at least to expect any continuation among some set of strings that depends on the language. Second, natural languages are ambiguous, that is a written or spoken portion of text can often be understood or analysed in several ways, and the analyses are handled as a set of strings as long as they cannot be reduced to a single analysis.
The chapter presents data structures used to memorize the suffixes of a text and some of their applications. These structures are designed to give a fast access to all factors of the text, and this is the reason why they have a fairly large number of applications in text processing.
Two types of objects are considered in this chapter, digital trees and automata, together with their compact versions. Trees put together common prefixes of the words in the set. Automata gather in addition their common suffixes. The structures are presented in order of decreasing size.
The representation of all the suffixes of a word by an ordinary digital tree called a suffix trie (Section 2.1) has the advantage of being simple but can lead to a memory size that is quadratic in the length of the considered word. The compact tree of suffixes (Section 2.2) is guaranteed to hold in linear memory space.
The minimization (related to automata) of the suffix trie gives the minimal automaton accepting the suffixes and is described in Section 2.4. Compaction and minimization yield the compact suffix automaton of Section 2.5.
Most algorithms that build the structures presented in this chapter work in time O(n × log Card A), for a text of length n, assuming that there is an ordering on the alphabet A. Their execution time is thus linear when the alphabet is finite and fixed. Locating a word of length m in the text then takes O(m × log Card A) time.
This chapter is an introductory chapter to the book. It gives general notions, notation, and technical background. It covers, in a tutorial style, the main notions in use in algorithms on words. In this sense, it is a comprehensive exposition of basic elements concerning algorithms on words, automata and transducers, and probability on words.
The general goal of “stringology” we pursue here is to manipulate strings of symbols, to compare them, to count them, to check some properties, and perform simple transformations in an effective and efficient way.
A typical illustrative example of our approach is the action of circular permutations on words, because several of the aspects we mentioned above are present in this example. First, the operation of circular shift is a transduction which can be realized by a transducer. We include in this chapter a section (Section 1.5) on transducers. Transducers will be used in Chapter 3. The orbits of the transformation induced by the circular permutation are the so-called conjugacy classes. Conjugacy classes are a basic notion in combinatorics on words. The minimal element in a conjugacy class is a good representative of a class. It can be computed by an efficient algorithm (actually in linear time). This is one of the algorithms which appear in Section 1.2. Algorithms for conjugacy are again considered in Chapter 2. These words give rise to Lyndon words which have remarkable combinatorial properties already emphasized in Lothaire (1997). We describe in Section 1.2.5 the Lyndon factorization algorithm.
A series of important applications of combinatorics on words has emerged with the development of computerized text and string processing, especially in biology and in linguistics. The aim of this volume is to present, in a unified treatment, some of the major fields of applications. The main topics that are covered in this book are
Algorithms for manipulating text, such as string searching, pattern matching, and testing a word for special properties.
Efficient data structures for retrieving information on large indexes, including suffix trees and suffix automata.
Combinatorial, probabilistic, and statistical properties of patterns in finite words, and more general pattern, under various assumptions on the sources of the text.
Inference of regular expressions.
Algorithms for repetitions in strings, such as maximal run or tandem repeats.
Linguistic text processing, especially analysis of the syntactic and semantic structure of natural language. Applications to language processing with large dictionaries.
Enumeration, generation, and sampling of complex combinatorial structures by their encodings in words.
This book is actually the third of a series of books on combinatorics on words. Lothaire's “Combinatorics on Words” appeared in its first printing in 1984 as Volume 17 of the Encyclopedia of Mathematics. It was based on the impulse of M. P. Schützenberger's scientific work. Since then, the theory developed to a large scientific domain. It was reprinted in 1997 in the Cambridge Mathematical Library.
Repeated patterns and related phenomena in words are known to play a central role in many facets of computer science, telecommunications, coding, data compression, and molecular biology. One of the most fundamental questions arising in such studies is the frequency of pattern occurrences in another string known as the text. Applications of these results include gene finding in biology, code synchronization, user search in wireless communications, detecting signatures of an attacker in intrusion detection, and discovering repeated strings in the Lempel-Ziv schemes and other data compression algorithms.
In basic pattern matching one finds for a given (or random) pattern w or a set of patterns W and text X how many times W occurs in the text and how long it takes for W to occur in X for the first time. These two problems are not unrelated as we have already seen in Chapter 6. Throughout this chapter we allow patterns to overlap and we count overlapping occurrences separately. For example, w = abab occurs three times in the text = bababababb.
We consider pattern matching problems in a probabilistic framework in which the text is generated by a probabilistic source while the pattern is given. In Chapter 1 various probabilistic sources were discussed. Here we succinctly summarize assumptions adopted in this chapter. In addition, we introduce a new general source known as a dynamical source recently proposed by Vallée. In Chapter 2 algorithmic aspects of pattern matching and various efficient algorithms for finding patterns were discussed.
The application of statistical methods to natural language processing has been remarkably successful over the past two decades. The wide availability of text and speech corpora has played a critical role in their success since, as for all learning techniques, these methods rely heavily on data. Many of the components of complex natural language processing systems, for example, text normalizers, morphological or phonological analyzers, part-of-speech taggers, grammars or language models, pronunciation models, context-dependency models, acoustic Hidden-Markov Models (HMMs), are statistical models derived from large data sets using modern learning techniques. These models are often given as weighted automata or weighted finite-state transducers either directly or as a result of the approximation of more complex models.
Weighted automata and transducers are the finite automata and finite-state transducers described in Chapter 1 Section 1.5 with the addition of some weight to each transition. Thus, weighted finite-state transducers are automata in which each transition, in addition to its usual input label, is augmented with an output label from a possibly different alphabet, and carries some weight. The weights may correspond to probabilities or log-likelihoods or they may be some other costs used to rank alternatives. More generally, as we shall see in the next section, they are elements of a semiring set. Transducers can be used to define a mapping between two different types of information sources, for example, word and phoneme sequences.