To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter addresses the problem of searching a fixed text. The associated data structure described here is known as the Suffix Array of the text. The searching procedure is presented first for a list of strings in Sections 4.1 and 4.2, and then adapted to a fixed text in the remaining sections.
The first three sections consider the question of searching a list of strings memorized in a table. The table is supposed to be fixed and can thus be preprocessed to speed up later accesses to it. The search for a string in a lexicon or a dictionary that can be stored in central memory of a computer is an application of this question.
We describe how to lexicographically sort the strings of the list (in maximal time proportional to the total length of the strings) in order to be able to apply a binary search algorithm. Actually, the sorting is not entirely sufficient to get an efficient search. The precomputation and the utilization of the longest common prefixes between the strings of the list are extra elements that make the technique very efficient. Searching for a string of length m in a list of n strings takes O(m + log n) time.
This chapter presents the algorithmic and combinatorial framework in which are developed the following chapters. It first specifies the concepts and notation used to work on strings, languages, and automata. The rest is mainly devoted to the introduction of chosen data structures for implementing automata, to the presentation of combinatorial results, and to the design of elementary pattern matching techniques. This organization is based on the observation that efficient algorithms for text processing rely on one or the other of these aspects.
Section 1.2 provides some combinatorial properties of strings that occur in numerous correctness proofs of algorithms or in their performance evaluation. They are mainly periodicity results.
The formalism for the description of algorithms is presented in Section 1.3, which is especially centered on the type of algorithm presented in the book, and introduces some standard objects related to queues and automata processing.
Section 1.4 details several methods to implement automata in memory, these techniques contribute, in particular, to results of Chapters 2, 5, and 6.
The first algorithms for locating strings in texts are presented in Section 1.5. The sliding window mechanism, the notions of search automaton and of bit vectors that are described in this section are also used and improved in Chapters 2, 3, and 8, in particular.
The techniques introduced in the two previous chapters find immediate applications for the realization of the index of a text. The utility of considering the suffixes of a text for this kind of application comes from the obvious remark that every factor of a string can be extended in a suffix of the text (see Figure 6.1). By storing efficiently the suffixes, we get a kind of direct access to all the factors of the text or of a language, and this is certainly the main interest of these techniques. From this property comes quite directly an implementation of the notion of index on a text or on a family of texts, with efficient algorithms for the basic operations (Section 6.2) such as the membership problem and the computation of the positions of occurrences of a pattern. Section 6.3 gives a solution under the form of a transducer. We deduce also quite directly solutions for the detection of repetitions (Section 6.4) and for the computation of forbidden strings (Section 6.5). Section 6.6 presents an inverted application of the previous techniques by using the index of a pattern in order to help searching fro itself. This method is extended in a particularly efficient way to the search for the conjugates (or rotations) of a string.
In this chapter, we are interested in the approximate search for fixed strings. Several notions of approximation on strings are considered: jokers, differences, and mismatches.
A joker is a symbol meant to represent all the letters of the alphabet. The solutions to the problem of searching a text for a pattern containing jokers use specific methods that are described in Section 8.1.
More generally, approximate pattern matching consists in locating all the occurrences of factors inside a text y that are similar to a string x. It consists in producing the positions of the factors of y that are at distance at most k from x, for a given natural integer k. We assume in the rest that k < |x| ≤ |y|. We consider two distances for measuring the approximation: the edit distance and the Hamming distance.
The edit distance between two strings u and v, that are not necessarily of the same length, is the minimum cost of a sequence of elementary edit operations between these two strings (see Section 7.1). The method at the basis of approximate pattern matching is a natural extension of the alignment method by dynamic programming of Chapter 7. It can be improved by using a restricted notion of distance obtained by considering the minimum number of edit operations rather than the sum of their costs.
In this chapter, we address the problem of searching for a pattern in a text when the pattern represents a finite set of strings. We present solutions based on the utilization of automata. Note first that the utilization of an automaton as solution of the problem is quite natural: given a finite language X ⊆ A*, locating all the occurrences of strings belonging to X in a text y ∈ A* amounts to determine all the prefixes of y that ends with a string of X; this amounts to recognize the language A*X; and as A*X is a regular language, this can be realized by an automaton. We additionally note that such solutions particularly suit to cases where a pattern has to be located in data that have to be processed in an online way: data flow analysis, downloading, virus detection, etc.
The utilization of an automaton for locating a pattern has already been discussed in Section 1.5. We complete here the subject by specifying how to obtain the deterministic automata mentioned at the beginning of this section. Complexities of the methods exposed at the end of Section 1.5 and that are valid for nondeterministic automata are also compared with those presented in this chapter.
This chapter is devoted to the detection of local periodicities that can occur inside a string.
The method for detecting these periodicities is based on a partitioning of the suffixes that also allows to sort them in lexicographic order. The process is analogue to the one used in Chapter 4 for the preparation of the suffix array of a string and achieves the same time and space complexity, but the information on the string collected during its execution is more directly useful.
In Section 9.1, we introduce a simplified partitioning method that is adapted to different questions in the rest of the chapter. The detection of periods is dealt with immediately after in Section 9.2.
In Section 9.3, we consider squares. Their search in optimal time uses algorithms that require combinatorial properties together with the utilization of the structures of Chapter 5. We discuss also the maximal number of squares that can occur in a string, which gives upper bounds on the number of local periodicities.
Finally, in Section 9.4, we come back to the problem of lexicographically sorting the suffixes of a string and to the computation of their common prefixes. The solution presented there is another adaptation of the partitioning method; it can be used with benefit for the construction of a suffix array (Chapter 4).
Alignments constitute one of the processes commonly used to compare strings. They allow to visualize the resemblance between strings. This chapter deals with several methods that perform the comparison of two strings in this sense. The extension to comparison methods of more than two strings is delicate, leads to algorithms whose execution time is at least exponential, and is not treated here.
Alignments are based on notions of distance or of similarity between strings. The computations are usually realized by dynamic programming. A typical example used for the design of efficient methods is the computation of the longest subsequence common to two strings. It shows the algorithmic techniques that are to implement in order to obtain an efficient computation and to extend possibly to general alignments. In particular, the reduction of the memory space obtained by one of the algorithms is a strategy that can often be applied in the solutions to close problems.
After the presentation of some distances defined on strings, notions of alignment and of edit graph, Section 7.2 describes the basic techniques for the computation of the edit (or alignment) distance and the production of the associated alignments. The chosen method highlights a global resemblance between two strings using assumptions that simplify the computation.
In this chapter, we present data structures for storing the suffixes of a text. These structures are conceived for providing a direct and fast access to the factors of the text. They allow to work on the factors of the string in almost the same way as the suffix array of Chapter 4 does, but the more important part of the technique is put on the structuring of data rather than on algorithms to search the text.
The main application of these techniques is to provide the basis of an index implementation as described in Chapter 6. The direct access to the factors of a string allows a large number of other applications. In particular, the structures can be used for matching patterns by considering them as search machines (see Chapter 6).
Two types of objects are considered in this chapter, trees and automata, together with their compact versions. Trees have for effect to factorize the prefixes of the strings in the set. Automata additionally factorize their common suffixes. The structures are presented in decreasing order of size.
The representation of the suffixes of a string by a trie (Section 5.1) has the advantage to be simple but can lead to a quadratic memory space according to the length of the considered string.
Suppose ƒ : X* → X* is a morphism and u,v ∈ X*. For every nonnegative integer n, let zn be the longest commonprefix of ƒn(u) and ƒn(v), and let un,vn ∈ X* be words suchthat ƒn(u) = znun and ƒn(v) = znvn. We prove that there is a positiveinteger q such that for any positive integer p, the prefixes of un(resp. vn) of length p form an ultimately periodic sequence having periodq. Further, there is a value of q which works for all words u,v ∈ X*.
The specification of the data structures used in EAT, a softwaresystem for symbolic computation in algebraic topology, is based onan operation that defines a link among different specificationframeworks like hidden algebras and coalgebras. In this paper,this operation is extended using the notion of institution, givingrise to three institution encodings. These morphisms define acommutative diagram which shows three possible views of the sameconstruction, placing it in an equational algebraic institution,in a hidden institution or in a coalgebraic institution. Moreover,these morphisms can be used to obtain a new description of thefinal objects of the categories of algebras in these frameworks,which are suitable abstract models for the EAT data structures.Thus, our main contribution is a formalization allowing us toencode a family of data structures by means of a single algebra(which can be described as a coproduct on the image of theinstitution morphisms). With this aim, new particular definitionsof hidden and coalgebraic institutions are presented.
Various static analyses of functional programming languages that permit infinite data structures make use of set constants like Top, Inf, and Bot, denoting all terms, all lists not eventually ending in Nil, and all non-terminating programs, respectively. We use a set language that permits union, constructors and recursive definition of set constants with a greatest fixpoint semanticsin the set of all, also infinite, computable trees,where all term constructors are non-strict.This paper provesdecidability, in particular DEXPTIME-completeness, of inclusion of co-inductively defined sets by using algorithms and results from tree automata and set constraints. The test for set inclusion is required by certain strictness analysis algorithms in lazy functional programming languagesand could also be the basis for further set-based analyses.
As already 2-monotone R-automata accept NP-complete languages,we introduce a restricted variant of j-monotonicityfor restarting automata, called sequential j-monotonicity.For restarting automata without auxiliary symbols,this restricted variant still yields infinite hierarchies.However, for restarting automata with auxiliary symbols,all degrees of sequential monotonicity collapse to thefirst level, implying that RLWW-automatathat are sequentially monotone of degree j for any j ≥ 1 only accept context-free languages.
A parallel communicating automata system consists of several automata working independently in parallel and communicating with each other by request with the aim of recognizing a word. Rather surprisingly, returning parallel communicating finite automata systems are equivalent to the non-returning variants. We show this result by proving the equivalence of both with multihead finite automata. Some open problems are finally formulated.
This paper analyses the complexity of model checking fixpoint logic with Chop – an extension of themodal μ-calculus with a sequential composition operator. It uses two known game-based characterisationsto derive the following results: the combined model checking complexity as well as the data complexity of FLC are EXPTIME-complete. This is already the case for its alternation-free fragment. The expressioncomplexity of FLC is trivially P-hard and limited from above by the complexity of solving a parity game, i.e. in UP ∩ co-UP. For any fragment of fixed alternation depth, in particular alternation-free formulas it is P-complete.
For polyominoes coded by their boundary word, we describe a quadratic O(n2) algorithm in the boundary length n which improves the naive O(n4) algorithm. Techniques used emanate from algorithmics, discrete geometry and combinatorics on words.
In this paper we will deal with the balance properties of the infinite binary words associated to β-integers when β is a quadratic simple Pisot number. Those words are the fixed points of the morphisms of the type $\varphi(A)=A^pB$, $\varphi(B)=A^q$ for $p\in\mathbb N$, $q\in\mathbb N$, $p\geq q$, where $\beta=\frac{p+\sqrt{p^2+4q}}{2}$. We will prove that such word is t-balanced with $t=1+\left[(p-1)/(p+1-q)\right]$. Finally, in the case that p < q it is known [B. Adamczewski, Theoret. Comput. Sci.273 (2002) 197–224] that the fixed point of the substitution $\varphi(A)=A^pB$, $\varphi(B)=A^q$ is not m-balanced for any m. We exhibit an infinite sequence of pairs of words with the unbalance property.
Education is the manifestation of the perfection already in man.
Swami Vivekananda (1863–1902)
This book is entirely devoted to the area of visibility algorithms in computational geometry and covers basic algorithms for visibility problems in two dimensions. It is intended primarily for graduate students and researchers in the field of computational geometry. It will also be useful as a reference/text for researchers working in algorithms, robotics, graphics and geometric graph theory.
The area of visibility algorithms started as a sub-area of computational geometry in the late 1970s. Many researchers have contributed significantly to this area in the last three decades and helped this area to mature considerably. The time has come to document the important algorithms in this area in a text book. Although some of the existing books in computational geometry have covered a few visibility algorithms, this book provides detailed algorithms for several important visibility problems. Hence, this book should not be viewed as another book on computational geometry but complementary to the existing books.
In some published papers, visibility algorithms are presented first and then the correctness arguments are given, based on geometric properties. While presenting an algorithm in this book, the geometric properties are first established through lemmas and theorems, and then the algorithm is derived from them. My experience indicates that this style of presentation generally helps a reader in getting a better grasp of the fundamentals of the algorithms.