To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapters 2–6 we introduced turbo, LDPC and RA codes and their iterative decoding algorithms. Simulation results show that these codes can perform extremely closely to Shannon's capacity limit with practical implementation complexity. In this chapter we analyze the performance of iterative decoders, determine how close they can in fact get to the Shannon limit and consider how the design of the codes will impact on this performance.
Ideally, for a given code and decoder we would like to know for which channel noise levels the message-passing decoder will be able to correct the errors and for which it will not. Unfortunately this is still an open problem. Instead, we will consider the set, or ensemble, of all possible codes with certain parameters (for example, a certain degree distribution) rather than a particular choice of code having those parameters.
For example, a turbo code ensemble is defined by its component encoders and consists of the set of codes generated by all possible interleaver permutations while an LDPC code ensemble is specified by the degree distribution of the Tanner graph nodes and consists of the set of codes generated by all possible permutations of the Tanner graph edges.
When very long codes are considered, the extrinsic LLRs passed between the component decoders can be assumed to be independent and identically distributed. Under this assumption the expected iterative decoding performance of a particular ensemble can be determined by tracking the evolution of these probability density functions through the iterative decoding process, a technique called density evolution.
In this chapter we introduce convolutional codes, the building blocks of turbo codes. Our starting point is to introduce convolutional encoders and their trellis representation. Then we consider the decoding of convolutional codes using the BCJR algorithm for the computation of maximum a posteriori message probabilities and the Viterbi algorithm for finding the maximum likelihood (ML) codeword. Our aim is to enable the presentation of turbo codes in the following chapter, so this chapter is by no means a thorough consideration of convolutional codes – we shall only present material directly relevant to turbo codes.
Convolutional encoders
Unlike a block code, which acts on the message in finite-length blocks, a convolutional code acts like a finite-state machine, taking in a continuous stream of message bits and producing a continuous stream of output bits. The convolutional encoder has a memory of the past inputs, which is held in the encoder state. The output depends on the value of this state, as well as on the present message bits at the input, but is completely unaffected by any subsequent message bits. Thus the encoder can begin encoding and transmission before it has the entire message. This differs from block codes, where the encoder must wait for the entire message before encoding.
When discussing convolutional codes it is convenient to use time to mark the progression of input bits through the encoder.
The idea of concatenating two or more error correction codes in series in order to improve the overall decoding performance of a system was introduced by Forney in 1966. Applying random-like interleaving and iterative decoding to these codes gives a whole new class of turbo-like codes that straddle the gap between parallel concatenated turbo codes and LDPC codes.
Concatenating two convolutional codes in series gives serially concatenated convolutional codes (SC turbo codes). We arrive at turbo block codes by concatenating two block codes and at repeat–accumulate codes by concatenating a repetition code and a convolutional (accumulator) code.
This chapter will convey basic information about the encoding, decoding and design of serially concatenated (SC) turbo codes. Most of what we need for SC turbo codes has already been presented in Chapter 4. The turbo encoder uses two convolutional encoders, from Section 4.2, while the turbo decoder uses two copies of the log BCJR decoder from Section 4.3. The section on design principles will refer to information presented in Chapter 5 and the discussion of repeat–accumulate codes will use concepts presented in Chapter 2. A deeper understanding of SC turbo codes and their decoding process is explored in Chapters 7 and 8.
Serial concatenation
The first serial concatenation schemes concatenated a high-rate block code with a short convolutional code. The first code, called the outer code, encoded the source message and passed the resulting codeword to the second code, called the inner code, which re-encoded it to obtain the final codeword to be transmitted.
In the previous chapter we analyzed the performance of iterative codes by calculating their threshold and thus comparing their performance in high-noise channels with the channel's capacity. In that analysis we considered the iterative decoding of code ensembles with given component codes, averaging over all possible interleaver–edge permutations. In this chapter we will also use the concept of code ensembles but will turn our focus to low-noise channels and consider the error floor performance of iterative code ensembles. Except for the special case of the binary erasure channel, our analysis will consider the properties of the codes independently of their respective iterative decoding algorithms. In fact, we will assume maximum likelihood (ML) decoding, for which the performance of a code depends only on its codeword weight distribution. Using ML analysis we can
demonstrate the source of the interleaver gain for iterative codes,
show why recursive encoders are so important for concatenated codes, and
show how the error floor performance of iterative codes depends on the chosen component codes.
Lastly, for the special case of the binary erasure channel we will use the concept of stopping sets to analyze the finite-length performance of LDPC ensembles and message-passing decoding.
Maximum likelihood analysis
Although it is impractical to decode the long, pseudo-random, codes designed for iterative decoding using ML decoding, the ML decoder is the best possible decoder (assuming equiprobable source symbols) and so provides an upper bound on the performance of iterative decoders.
The construction of binary low-density parity-check (LDPC) codes simply involves replacing a small number of the values in an all-zeros matrix by 1s in such a way that the rows and columns have the required degree distribution. In many cases, randomly allocating the entries in H will produce a reasonable LDPC code. However, the construction of H can affect the performance of the sum–product decoder, significantly so for some codes, and also the implementation complexity of the code.
While there is no one recipe for a “good” LDPC code, there are a number of principles that inform the code designer. The first obvious decisions are which degree distribution to choose and how to construct the matrix with the chosen degrees, i.e. pseudo-randomly or with some sort of structure. Whichever construction is chosen, the features to consider include the girth of the Tanner graph and the minimum distance of the code.
In this chapter we will discuss those properties of an LDPC code that affect its iterative decoding performance and then present the common construction methods used to produce codes with the preferred properties. Following common practice in the field we will call the selection of the degree distributions for an LDPC code code design and the methods to assign the locations in the parity-check matrix for the 1 entries code construction.
In this chapter we introduce our task: communicating a digital message without error (or with as few errors as possible) despite an imperfect communications medium. Figure 1.1 shows a typical communications system. In this text we will assume that our source is producing binary data, but it could equally be an analog source followed by analog-to-digital conversion.
Through the early 1940s, engineers designing the first digital communications systems, based on pulse code modulation, worked on the assumption that information could be transmitted usefully in digital form over noise-corrupted communication channels but only in such a way that the transmission was unavoidably compromised. The effects of noise could be managed, it was believed, only by increasing the transmitted signal power enough to ensure that the received signal-to-noise ratio was sufficiently high.
Shannon's revolutionary 1948 work changed this view in a fundamental way, showing that it is possible to transmit digital data with arbitrarily high reliability, over noise-corrupted channels, by encoding the digital message with an error correction code prior to transmission and subsequently decoding it at the receiver. The error correction encoder maps each vector of K digits representing the message to longer vectors of N digits known as codewords. The redundancy implicit in the transmission of codewords, rather than the raw data alone, is the quid pro quo for achieving reliable communication over intrinsically unreliable channels. The code rate r = K/N defines the amount of redundancy added by the error correction code.
In this chapter we introduce low-density parity-check (LDPC) codes, a class of error correction codes proposed by Gallager in his 1962 PhD thesis 12 years after error correction codes were first introduced by Hamming (published in 1950). Both Hamming codes and LDPC codes are block codes: the messages are broken up into blocks to be encoded at the transmitter and similarly decoded as separate blocks at the receiver. While Hamming codes are short and very structured with a known, fixed, error correction ability, LDPC codes are the opposite, usually long and often constructed pseudo-randomly with only a probabilistic notion of their expected error correction performance.
The chapter begins by presenting parity bits as a means to detect and, when more than one is employed, to correct errors in digital data. Block error correction codes are described as a linear combination of parity-check equations and thus defined by their parity-check matrix representation. The graphical representation of codes by Tanner graphs is presented and the necessary graph theoretic concepts introduced.
In Section 2.4 iterative decoding algorithms are introduced using a hard decision algorithm (bit flipping), so that the topic is developed first without reference to probability theory. Subsequently the sum–product decoding algorithm is presented.
This chapter serves as a self-contained introduction to LDPC codes and their decoding. It is intended that the material presented here will enable the reader to implement LDPC encoders and iterative decoders.
In this chapter we introduce turbo codes, the ground-breaking codes introduced by Berrou, Glavieux and Thitimajshima in 1993 [82], which sparked a complete rethink of how we do error correction. Turbo codes are the parallel concatenation of two convolutional codes which, at the receiver, share information between their respective decoders. Thus most of what we need for a turbo code has already been presented in the previous chapter. The turbo encoder uses two convolutional encoders (see Section 4.2) while the turbo decoder uses two copies of the log BCJR decoder (see Section 4.3).
The exceptional performance of turbo codes is due to the long pseudo-random interleaver, introduced below, which produces codes reminiscent of Shannon's noisy channel coding theorem, and to the low-complexity iterative algorithm that makes their implementation feasible.
In the first part of this chapter we discuss how the component convolutional codes and interleaver combine to form a turbo code. We then consider the properties of a turbo code that affect its iterative decoding performance and describe turbo code design strategies. Our aim here is to convey basic information about the encoding, decoding and design of turbo codes; a deeper understanding of these codes and of the decoding process is left to later chapters.
Turbo encoders
At the encoder, turbo codes use a parallel concatenation of two convolutional component encoders, as shown in Figure 5.1.
The field of error correction coding was launched with Shannon's revolutionary 1948 work showing – quite counter to intuition – that it is possible to transmit digital data with arbitrarily high reliability over noise-corrupted channels, provided that the rate of transmission is below the capacity of the channel. The mechanism for achieving this reliable communication is to encode a digital message with an error correction code prior to transmission and apply a decoding algorithm at the receiver.
Classical block and convolutional error correction codes were described soon afterwards and the first iterative codes were published by Gallager in his 1962 thesis; however they received little attention until the late 1990s. In the meantime, the highly structured algebraic codes introduced by Hamming, Elias, Reed, Muller, Solomon and Golay among others dominated the field. Despite the enormous practical success of these classical codes, their performance fell well short of the theoretically achievable performances set down by Shannon in his seminal 1948 paper. By the late 1980s, despite decades of attempts, researchers were largely resigned to this seemingly insurmountable theory–practice gap.
The relative quiescence of the coding field was utterly transformed by the introduction of “turbo codes”, proposed by Berrou, Glavieux and Thitimajshima in 1993, wherein all the key ingredients of successful error correction codes were replaced: turbo codes involve very little algebra, employ iterative, distributed, algorithms and focus on average (rather than worst-case) performance. This was a ground-shifting paper, forcing coding theorists to revise strongly held beliefs.
In the prisoners' problem, Alice and Bob are allowed to communicate but all messages they exchange are closely monitored by warden Eve looking for traces of secret data that may be hidden in the objects that Alice and Bob exchange. Eve's activity is called steganalysis and it is a complementary task to steganography. In theory, the steganalyst is successful in attacking the steganographic channel (i.e., the steganography has been broken) if she can distinguish between cover and stego objects with probability better than random guessing. Note that, in contrast to cryptanalysis, it is not necessary to be able to read the secret message to break a steganographic system. The important task of extracting the secret message from an image once it is known to contain secretly embedded data belongs to forensic steganalysis.
In Section 10.1, we take a look at various aspects of Eve's job depending on her knowledge about the steganographic channel. Then, in Section 10.2 we formulate steganalysis as a problem in statistical signal detection. If Eve knows the steganographic algorithm, she can accordingly target her activity to the specific stegosystem, in which case we speak of targeted steganalysis (Section 10.3). On the other hand, if Eve has no knowledge about the stegosystem the prisoners may be using, she is facing the significantly more difficult problem of blind steganalysis detailed in Section 10.4 and Chapter 12. She now has to be ready to discover traces of an arbitrary stegosystem. Both targeted and blind steganalysis work with one or more numerical features extracted from images and then classify them into two categories – cover and stego.
The goal of steganalysis is to detect the presence of secretly embedded messages. Depending on how much information the warden has about the steganographic channel she is trying to attack, the detection problem can accept many different forms. In the previous chapter, we dealt with the situation when the warden knows the steganographic method that Alice and Bob might be using. With this knowledge, Eve can tailor her steganalysis to the particular steganographic channel using several strategies outlined in Section 10.3. If Eve has no information about the steganographic method, she needs blind steganalysis capable of detecting as wide a spectrum of steganographic methods as possible. Design and implementation of practical blind steganalysis detectors is the subject of this chapter.
The first and most fundamental step for Eve is to accept a model of cover images and represent each image using a vector of features. In contrast to targeted steganalysis, where a single feature (e.g., an estimate of message length) was often enough to construct an accurate detector, blind steganalysis by definition requires many features. This is because the role of features in blind steganalysis is significantly more fundamental – in theory they need to capture all possible patterns natural images follow so that every embedding method the prisoners can devise disturbs at least some of the features. In Section 10.4, we loosely formulated this requirement as completeness of the feature space and outlined possible strategies for constructing good features.
The definition of steganographic security given in the previous chapter should be a guiding design principle for constructing steganographic schemes. The goal is clear – to preserve the statistical distribution of cover images. Unfortunately, digital images are quite complicated objects that do not allow accurate description using simple statistical models. The biggest problem is their non-stationarity and heterogeneity. While it is possible to obtain simple models of individual small flat segments in the image, more complicated textures often present an insurmountable challenge for modeling because of a lack of data to fit an accurate local model. Moreover, and most importantly, as already hinted in Chapter 3, digital images acquired using sensors exhibit many complicated local dependences that the embedding changes may disturb and leave statistically detectable artifacts. Consequently, the lack of good image models gives space to heuristic methods.
In this chapter, we discuss four major guidelines for construction of practical steganographic schemes:
• Preserve a model of the cover source (Section 7.1);
• Make the embedding resemble some natural process (Section 7.2);
• Design the steganography to resist known steganalysis attacks (Section 7.3);
• Minimize the impact of embedding (Section 7.4).
Steganographic schemes from the first class are based on a simplified model of the cover source. The schemes are designed to preserve the model and are thus undetectable within this model.
Steganalysis is the activity directed towards detecting the presence of secret messages. Due to their complexity and dimensionality, digital images are typically analyzed in a low-dimensional feature space. If the features are selected wisely, cover images and stego images will form clusters in the feature space with minimal overlap. If the warden knows the details of the embedding mechanism, she can use this side-information and design the features accordingly. This strategy is recognized as targeted steganalysis. The histogram attack and the attack on Jsteg from Chapter 5 are two examples of targeted attacks.
Three general strategies for constructing features for targeted steganalysis were described in the previous chapter. This chapter presents specific examples of four targeted attacks on steganography in images stored in raster, palette, and JPEG formats. The first attack, called Sample Pairs Analysis, detects LSB embedding in the spatial domain by considering pairs of neighboring pixels. It is one of the most accurate methods for steganalysis of LSB embedding known today. Section 11.1 contains a detailed derivation of this attack as well as several of its variants formulated within the framework of structural steganalysis. The Pairs Analysis attack is the subject of Section 11.2. It was designed to detect steganographic schemes that embed messages in LSBs of color indices to a preordered palette. The EzStego algorithm from Chapter 5 is an example of this embedding method. Pairs Analysis is based on an entirely different principle than Sample Pairs Analysis because it uses information from pixels that can be very distant.
In the previous chapter, we learned that one of the general guiding principles for design of steganographic schemes is the principle of minimizing the embedding impact. The plausible assumption here is that it should be more difficult for Eve to detect Alice and Bob's clandestine activity if they leave behind smaller embedding distortion or “impact.” This chapter introduces a very general methodology called matrix embedding using which the prisoners can minimize the total number of changes they need to carry out to embed their message and thus increase the embedding efficiency. Even though special cases of matrix embedding can be explained in an elementary fashion on an intuitive level, it is extremely empowering to formulate it within the framework of coding theory. This will require the reader to become familiar with some basic elements of the theory of linear codes. The effort is worth the results because the reader will be able to design more secure stegosystems, acquire a deeper understanding of the subject, and realize connections to an already well-developed research field. Moreover, according to the studies that appeared in [143, 95], matrix embedding is one of the most important design elements of practical stegosystems.
As discussed in Chapter 5, in LSB embedding or ±1 embedding one pixel communicates exactly one message bit. This was the case of OutGuess as well as Jsteg.