To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In 1948, in the introduction to his classic paper, “A mathematical theory of communication,” Claude Shannon wrote:
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.”
To solve that problem he created, in the pages that followed, a completely new branch of applied mathematics, which is today called information theory and/or coding theory. This book's object is the presentation of the main results of this theory as they stand 30 years later.
In this introductory chapter we illustrate the central ideas of information theory by means of a specific pair of mathematical models, the binary symmetric source and the binary symmetric channel.
The binary symmetric source (the source, for short) is an object which emits one of two possible symbols, which we take to be “0” and “1,” at a rate of R symbols per unit of time. We shall call these symbols bits, an abbreviation of binary digits. The bits emitted by the source are random, and a “0” is as likely to be emitted as a “1.” We imagine that the source rate R is continuously variable, that is, R can assume any nonnegative value.
The binary symmetric channel (the BSC2 for short) is an object through which it is possible to transmit one bit per unit of time. However, the channel is not completely reliable: there is a fixed probability p (called the raw bit error probability3), 0 ≤ p ≤ ½, that the output bit will not be the same as the input bit.
Young tableaux have had a long history since their introduction by A. Young a century ago. It is only in the 1960s that there came to the fore a monoid structure on them, a structure taking into account most of their combinatorial properties, and having applications to the different fields in which Young tableaux were used.
Summarizing what had been his motivation to spend so much time on the plactic monoid, M.R Schützenberger drew out three reasons: (1) it allows us to embed the ring of symmetric polynomials into a noncommutative ring; (2) it is the syntactic monoid of a function on words generalizing the maximal length of a nonincreasing subword; (3) it is a natural generalization to alphabets with more than two letters of the monoid of parentheses.
The starting point of the theory is an algorithm, due to C. Schensted, for the determination of the maximal length of a nondecreasing subword of a given word. The output of this algorithm is a tableau, and if one decides to identify the words leading to the same tableau, one arrives at the plactic monoid, whose defining relations were determined by D. Knuth.
The first significant application of the plactic monoid was to provide a complete proof of the Littlewood-Richardson rule, a combinatorial algorithm for multiplying Schur functions (or equivalently, to decompose tensor products of representations of unitary groups, a fundamental issue in many applications, e.g., in particle physics), which had been in use for almost 50 years before being fully understood.
A large body of mathematics consists of facts that can be presented and described much like any other natural phenomenon. These facts, at times explicitly brought out as theorems, at other times concealed within a proof, make up most of the applications of mathematics, and are the most likely to survive changes of style and of interest.
This ENCYCLOPEDIA will attempt to present the factual body of all mathematics. Clarity of exposition, accessibility to the non-specialist, and a thorough bibliography are required of each author. Volumes will appear in no particular order, but will be organized into sections, each one comprising a recognizable branch of present-day mathematics. Numbers of volumes and sections will be reconsidered as times and needs change.
It is hoped that this enterprise will make mathematics more widely used where it is needed, and more accessible in fields in which it can be applied but where it has not yet penetrated because of insufficient information.
Information theory is a success story in contemporary mathematics. Born out of very real engineering problems, it has left its imprint on such far-flung endeavors as the approximation of functions and the central limit theorem of probability. It is an idea whose time has come.
Most mathematicians cannot afford to ignore the basic results in this field. Yet, because of the enormous outpouring of research, it is difficult for anyone who is not a specialist to single out the basic results and the relevant material.
In Chapter 1, avoidable and unavoidable sets of words have been defined. The focus was then on the case of finite sets of words. In the present chapter, we turn to particular infinite sets of words, defined as pattern languages. A pattern is a word that contains special symbols called variables, and the associated pattern language is obtained by replacing the variables with arbitrary nonempty words, with the condition that two occurrences of the same variable have to be replaced with the same word.
The archetype of a pattern is the square, αα. The associated pattern language is L = {uu|u ∈ A+}, and it is now a classical result that L is an avoidable set of words if A has at least three elements, whereas it is an unavoidable set of words if A has only one or two elements. Indeed, an infinite square-free word on three letters can be constructed, and it is easy to check that every binary word of length 4 contains a square. For short, we will say that the pattern αα is 3-avoidable and 2-unavoidable.
General patterns can contain more than just one variable. For instance, αβα represents words of the form uvu, with u, v ∈ A+ (this pattern is unavoidable whatever the size of the alphabet; see Proposition 3.1.2). They could also be allowed to contain constant letters, which unlike variables are never replaced with arbitrary words, but this is not very useful in the context of avoidability, so we will consider here only “pure” patterns, constituted only of variables.
This chapter serves the same function for Part two as Chapter 6 served for Part one, that is, it summarizes some of the most important results in coding theory which have not been treated in Chapters 7–11. In Sections 12.2, 12.3, and 12.4 we treat channel coding (block codes, convolutional codes, and a comparison of the two). Finally in Section 12.5 we discuss source coding.
Block codes
The theory of block codes is older and richer than the theory of convolutional codes, and so this section is much longer than Section 12.3. (This imbalance does not apply to practical applications, however; see Section 12.4.) In order to give this section some semblance of organization, we shall classify the results to be cited according to Berlekamp's [15] list of the three major problems of coding theory:
How good are the best codes?
How can we design good codes?
How can we decode such codes?
•How good are the best codes? One of the earliest problems which arose in coding theory was that of finding perfect codes. If we view a code of length n over the finite field Fq as a subset {x1; x2, …, xM} of the vector space Vn(Fq), the code is said to be perfect (or close packed) if for some integer e, the Hamming spheres of radius e around the M codewords completely fill Vn(Fq) without overlap.
Periodicity is an important property of words that has applications in various domains. The first significant results on periodicity are the theorem of Fine and Wilf and the critical factorization theorem. These two results refer to two kinds of phenomena concerning periodicity: the theorem of Fine and Wilf considers the simultaneous occurrence of different periods in one finite word, whereas the critical factorization theorem relates local and global periodicity of words. Starting from these basic results the study of periodicity has grown along both directions. This chapter contains a systematic and self-contained exposition of this theory, including very recent results.
In Section 8.1 we analyze the structure of the set of periods of one finite word. This section includes a proof of the theorem of Fine and Wilf and also a generalization of this result to words having three periods. We next give the characterization of Guibas and Odlyzko concerning those sets of integers that yield the periods that can simultaneously occur in a single finite word. Another property is further investigated (similar to the one stated by the theorem of Fine and Wilf) in which the occurrence of two periods in a word of a certain length forces the word to have a shorter period only in a prefix (or suffix) of the word. The golden ratio appears in such a result as an extremal value of a parameter involved in the property. This section also contains some results concerning the squares that can appear as factors in a word. This is a prelude to the next section, since squares describe a special kind of local periodicity.
It is a well-known and not too difficult result of combinatorics on words that if two words commute under the concatenation product, then they are both powers of the same word: they have a common root. This fact is essentially equivalent to the following one: the centralizer of a nonempty word, that is, the set of words commuting with it, is the set of powers of the shortest root of the given word.
The main results of this chapter are an extension of this latter result to noncommutative series and polynomials: Cohn's and Bergman's centralizer theorems. The first asserts that the centralizer of an element of the algebra of noncommutative formal series is isomorphic to an algebra of formal series in one variable. The second is the similar result for non-commutative polynomials. Note that these theorems admit the following consequences: if two noncommutative series (resp. polynomials) commute, then they may both be expressed as a series (resp. a polynomial) in a third one. This formulation stresses the similarity with the result on words given above.
We begin with Cohn's theorem, since it is needed for Bergman's theorem. Its proof requires mainly a divisibility property of noncommutative series. The proof of Bergman's theorem is rather indirect: it uses the noncommutative Euclidean division of Cohn, the difficult result that the centralizer of a noncommutative polynomial is integrally closed in its field of fractions, its embeddability in a one-variable polynomial ring, which uses a pretty argument of combinatorics on words, and finally another result of Cohn characterizing free subalgebras of a one-variable polynomial algebra.
The main changes in this edition are in Part two. The old Chapter 8 (“BCH, Goppa, and Related Codes”) has been revised and expanded into two new chapters, numbered 8 and 9. The old chapters 9, 10, and 11 have then been renumbered 10, 11, and 12. The new Chapter 8 (“Cyclic codes”) presents a fairly complete treatment of the mathematical theory of cyclic codes, and their implementation with shift register circuits. It culminates with a discussion of the use of cyclic codes in burst error correction. The new Chapter 9 (“BCH, Reed-Solomon, and Related Codes”) is much like the old Chapter 8, except that increased emphasis has been placed on Reed-Solomon codes, reflecting their importance in practice. Both of the new chapters feature dozens of new problems.