To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Classical finite-state automata represent the most important class of monoidal finite-state automata. Since the underlying monoid is free, this class of automaton has several interesting specific features. We show that each classical finite-state automaton can be converted to an equivalent classical finite-state automaton where the transition relation is a function. This form of ‘deterministic’ automaton offers a very efficient recognition mechanism since each input word is consumed on at most one path. The fact that each classical finite-state automaton can be converted to a deterministic automaton can be used to show that the class of languages that can be recognized by a classical finite-state automaton is closed under intersections, complements, and set differences. The characterization of regular languages and deterministic finite-state automata in terms of the ‘Myhill–Nerode equivalence relation’ to be introduced in the chapter offers an algebraic view on these notions and leads to the concept of minimal deterministic automata.
A fundamental task in natural language processing is the efficient representation of lexica. From a computational viewpoint, lexica need to be represented in a way directly supporting fast access to entries, and minimizing space requirements. A standard method is to represent lexica as minimal deterministic (classical) finite-state automata. To reach such a representation it is of course possible to first build the trie of the lexicon and then to minimize this automaton afterwards. However, in general the intermediate trie is much larger than the resulting minimal automaton. Hence a much better strategy is to use a specialized algorithm to directly compute the minimal deterministic automaton in an incremental way. In this chapter we describe such a procedure.
This chapter describes a special construction based on finite-state automata with important applications: the Aho–Corasick algorithm is used to efficiently find all occurrences of a finite set of strings (also called pattern set, or dictionary) in a given input string, called the ‘text’. Search is ‘online’, which means that the input text is neither fixed nor preprocessed in any way. This problem is a special instance of pattern matching in strings, and other automata constructions are used to solve other pattern matching tasks. From an automaton point of view, the Aho–Corasick algorithm comes in two variants. We first present the more efficient version where a classical deterministic finite-state automaton is built for text search. The disadvantage of this first construction is that the resulting automaton can become very large, in particular for large pattern alphabets. Afterwards we present the second version, where an automaton with additional transitions of a particular kind is built, yielding a much smaller device for text search.
A common task arising in many contexts is rewriting parts of a given input string to another form. Subparts of the input that match specific conditions are replaced by other output parts. In this way, the complete input string is translated to a new output form. Due to the importance of text rewriting, many programming languages offer matching/rewriting operations for subexpressions of strings, also called replace rules. When using strictly regular relations and functions for representing replace rules, a cascade of replace rules can be composed into a single transducer. If the transducer is functional, an equivalent bimachine or (in some cases) a subsequential transducer can be built, thus achieving theoretically and practically optimal text processing speed. In this chapter we introduce basic constructions for building text rewriting transducers and bimachines from replace rules and provide implementations. A first simple version in general leads to an ambiguous form of text rewriting with several outputs. A second more sophisticated construction solves conflicts using the leftmost-longest match strategy and leads to functional devices.
An important generalization of classical finite-state automata are multi-tape automata, which are used for recognizing relations of a particular type. The so-called regular relations (also refered to as ‘rational relations’) offer a natural way to formalize all kinds of translations and transformations, which makes multi-tape automata interesting for many practical applications and explains the general interest in this kind of device. A natural subclass are monoidal finite-state transducers, which can be defined as two-tape automata where the first tape reads strings. In this chapter we present the most important properties of monoidal multi-tape automata in general and monoidal finite-state transducers in particular. We show that the class of relations recognized by n-tape automata is closed under a number of useful relational operations like composition, Cartesian product, projection, inverse etc. We further present a procedure for deciding the functionality of classical finite-state transducers.
In this chapter we introduce the C(M) language, a new programming language. C(M) statements and expressions closely resemble the notation commonly used for the presentation of formal constructions in a Tarskian style set theoretical language. The usual set theoretic objects such as sets, functions, relations, tuples etc. are naturally integrated in the language. In contrast to imperative languages such as C or Java, C(M) is a functional declarative programming language. C(M) has many similarities with Haskell but makes use of the standard mathematical notation like SETL. The C(M) compiler translates a well-formed C(M) program into efficient C code, which can be executed after compilation. Since it is easy to read C(M) programs, a pseudo-code description becomes obsolete.
In this chapter we introduce the bimachine, a deterministic finite-state device that exactly represents the class of all regular string functions. We prove this correspondence, using as a key ingredient a procedure for converting transducers to bimachines. Methods for pseudo-minimization and direct composition of bimachines are added.
In this chapter we present C(M) implementations of the main automata constructions. Our aim is to provide full descriptions of the implementations that are clear and easy to follow. In some cases the simplicity of the implementation is achieved at the expense of some inefficiency.
In this chapter we explore deterministic finite-state transducers. Obviously, it only makes sense to ask for determinism if we restrict attention to transducers with a functional input-output behaviour. In this chapter we focus on transducers that are deterministic on the input tape (called sequential or subsquential transducers). We shall see that only a proper subset of all regular string functions can be represented by this kind of device and we describe a decision procedure for testing whether a functional transducer can be determinized. Further we present a subsequential transducer minimization procedure based on theMyhill–Nerode relation for string functions.
Finite-state methods are the most efficient mechanisms for analysing textual and symbolic data, providing elegant solutions for an immense number of practical problems in computational linguistics and computer science. This book for graduate students and researchers gives a complete coverage of the field, starting from a conceptual introduction and building to advanced topics and applications. The central finite-state technologies are introduced with mathematical rigour, ranging from simple finite-state automata to transducers and bimachines as 'input-output' devices. Special attention is given to the rich possibilities of simplifying, transforming and combining finite-state devices. All algorithms presented are accompanied by full correctness proofs and executable source code in a new programming language, C(M), which focuses on transparency of steps and simplicity of code. Thus, by enabling readers to obtain a deep formal understanding of the subject and to put finite-state methods to real use, this book closes the gap between theory and practice.
This special issue is devoted to the Mathematical Analysis of Algorithms, which aims to predict the performance of fundamental algorithms and data structures in general use in Computer Science. The simplest measure of performance is the expected value of a cost function under natural models of randomness for the data, and finer properties of the cost distribution provide a deeper understanding of the complexity. Research in this area, which is intimately connected to combinatorics and random discrete structures, uses a rich variety of combinatorial, analytic and probabilistic methods.
Let G(n,M) be a uniform random graph with n vertices and M edges. Let ${\wp_{n,m}}$ be the maximum block size of G(n,M), that is, the maximum size of its maximal 2-connected induced subgraphs. We determine the expectation of ${\wp_{n,m}}$ near the critical point M = n/2. When n − 2M ≫ n2/3, we find a constant c1 such that
This study relies on the symbolic method and analytic tools from generating function theory, which enable us to describe the evolution of $n^{-1/3}\,\E{\left({\wp_{n,{{(n/2)}({1+\lambda n^{-1/3}})}}}\right)}$ as a function of λ.
Given graphs G and H, a family of vertex-disjoint copies of H in G is called an H-tiling. Conlon, Gowers, Samotij and Schacht showed that for a given graph H and a constant γ>0, there exists C>0 such that if $p \ge C{n^{ - 1/{m_2}(H)}}$, then asymptotically almost surely every spanning subgraph G of the random graph 𝒢(n, p) with minimum degree at least
contains an H-tiling that covers all but at most γn vertices. Here, χcr(H) denotes the critical chromatic number, a parameter introduced by Komlós, and m2(H) is the 2-density of H. We show that this theorem can be bootstrapped to obtain an H-tiling covering all but at most $\gamma {(C/p)^{{m_2}(H)}}$ vertices, which is strictly smaller when $p \ge C{n^{ - 1/{m_2}(H)}}$. In the case where H = K3, this answers the question of Balogh, Lee and Samotij. Furthermore, for an arbitrary graph H we give an upper bound on p for which some leftover is unavoidable and a bound on the size of a largest H -tiling for p below this value.
There was an incorrect argument in the proof of the main theorem in ‘On percolation and the bunkbed conjecture’, in Combin. Probab. Comput. (2011) 20 103–117 doi: 10.1017/S0963548309990666. I thus no longer claim to have a proof for the bunkbed conjecture for outerplanar graphs.