Search results for Communications and signal processing

Information Theory

From Coding to Learning
Yury Polyanskiy, Yihong Wu
Published online:

09 January 2025

Print publication:

02 January 2025
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.

10 - Variable-Length Compression
from Part II - Lossless Data Compression
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 199-213
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The main reason for the possibility of data compression is the experimental (empirical) law: Real-world sources produce very restricted sets of sequences. How do we model these restrictions? Chapter 10 looks at the first of three compression types that we will consider: variable-length lossless compression.

16 - Hypothesis Testing: Error Exponents
from Part III - Hypothesis Testing and Large Deviations
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 317-332
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter our goal is to determine the achievable region of the exponent pairs for the type-I and type-II error probabilities. Our strategy is to apply the achievability and (strong) converse bounds from Chapter 14 in conjunction with the large-deviations theory developed in Chapter 15. After characterizing the full tradeoff we will discuss an adaptive setting of hypothesis testing where, instead of committing ahead of time to testing on the basis of n samples, one can decide adaptively whether to request more samples or stop. We will find out that adaptivity greatly increases the region of achievable error exponents and will learn about the sequential probability ratio test (SPRT) of Wald. In the closing sections we will discuss relations to more complicated settings in hypothesis testing: one with composite hypotheses and one with communication constraints.

Introduction
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp xvii-xxiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part II - Lossless Data Compression
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 195-198
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

24 - Rate-Distortion Theory
from Part V - Rate-Distortion Theory and Metric Entropy
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 489-501
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The operation of mapping (naturally occurring) continuous time/analog signals into (electronics-friendly) discrete/digital signals is known as quantization, which is an important subject in signal processing in its own right. In information theory, the study of optimal quantization is called rate-distortion theory, introduced by Shannon in 1959. To start, in Chapter 24 we will take a closer look at quantization, followed by the information-theoretic formulation. A simple (and tight) converse bound is then given, with the matching achievability bound deferred to Chapter 25.

Contents
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp vii-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Variational Characterizations and Continuity of Information Measures
from Part I - Information Measures
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 66-90
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 4 we collect some results on variational characterizations of information measures. It is a well-known method in analysis to study a functional by proving variational characterizations representing it as a supremum or infimum of some other, simpler (often linear) functionals. Such representations can be useful for multiple purposes:
Convexity: the pointwise supremum of convex functions is convex.
Regularity: the pointwise supremum of lower semicontinuous (lsc) functions is lsc.
Bounds: the upper and lower bounds on the functional follow by choosing good solutions in the optimization problem.
We will see in this chapter that divergence has two different sup-characterizations (over partitions and over functions). The mutual information is more special. In addition to inheriting the ones from Kullback–Leibler divergence, it possesses two extra: an inf-representation over (centroid) measures and a sup-representation over Markov kernels. As applications of these variational characterizations, we discuss the Gibbs variational principle, which serves as the basis of many modern algorithms in machine learning, including the EM algorithm and variational autoencoders; see Section 4.4. An important theoretical construct in machine learning is the idea of PAC-Bayes bounds (Section 4.8*).

Index
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 714-724
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

23 - Channel Coding with Feedback
from Part IV - Channel Coding
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 452-470
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

So far we have been focusing on the paradigm for one-way communication: data are mapped to codewords and transmitted, and later decoded based on the received noisy observations. Chapter 23 looks at the more practical setting (except for storage), where the communication frequently goes in both ways so that the receiver can provide certain feedback to the transmitter. As a motivating example, consider the communication channel of the downlink transmission from a satellite to earth. Downlink transmission is very expensive (power constraint at the satellite), but the uplink from earth to the satellite is cheap which makes virtually noiseless feedback readily available at the transmitter (satellite). In general, channel with noiseless feedback is interesting when such asymmetry exists between uplink and downlink. Even in less ideal settings, noisy or partial feedbacks are commonly available that can potentially improve the reliability or complexity of communication. In the first half of our discussion, we shall follow Shannon to show that even with noiseless feedback “nothing” can be gained in the conventional setup. In the process, we will also introduce the concept of Massey’s directed information. In the second half of the Chapter we examine situations where feedback is extremely helpful: low probability of error, variable transmission length and variable transmission power.

21 - Capacity per Unit Cost
from Part IV - Channel Coding
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 419-430
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 21 we will consider an interesting variation of the channel coding problem. Instead of constraining the blocklength (i.e., the number of channel uses), we will constrain the total cost incurred by the codewords. The motivation is the following. Consider a deep-space probe that has a k-bit message that needs to be delivered to Earth (or a satellite orbiting it). The duration of transmission is of little worry for the probe, but what is really limited is the amount of energy it has stored in its battery. In this chapter we will learn how to study this question abstractly and how this fundamental limit is related to communication over continuous-time channels.

25 - Rate Distortion: Achievability Bounds
from Part V - Rate-Distortion Theory and Metric Entropy
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 502-518
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 25 we present the hard direction of the rate-distortion theorem: the random coding construction of a quantizer. This method is extended to the development of a covering lemma and soft-covering lemma, which lead to the sharp result of Cuff showing that the fundamental limit of channel simulation is given by Wyner’s common information. We also derive (a strengthened form of) Han and Verdú’s results on approximating output distributions in Kullback–Leibler.

Exercises for Part VI
from Part VI - Statistical Applications
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 677-689
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. The book introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite blocklength approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning, and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC-Bayes and variational principle, Kolmogorov’s metric entropy, strong data-processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by additional stand-alone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.

27 - Metric Entropy
from Part V - Rate-Distortion Theory and Metric Entropy
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 534-558
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The topic of this chapter is the deterministic (worst-case) theory of quantization. The main object of interest is the metric entropy of a set, which allows us to answer two key questions:
(1) covering number: the minimum number of points to cover a set up to a given accuracy;
(2) packing number: the maximal number of elements of a given set with a prescribed minimum pairwise distance.
The foundational theory of metric entropy was put forth by Kolmogorov, who, together with his students, also determined the behavior of metric entropy in a variety of problems for both finite and infinite dimensions. Kolmogorov’s original interest in this subject stems from Hilbert’s thirteenth problem, which concerns the possibility or impossibility of representing multivariable functions as compositions of functions of fewer variables. Metric entropy has found numerous connections to and applications in other fields, such as approximation theory, empirical processes, small-ball probability, mathematical statistics, and machine learning.

17 - Error-Correcting Codes
from Part IV - Channel Coding
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 349-356
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 17 we introduce the concept of an error-correcting code (ECC). We will spend time discussing what it means for a code to have low probability of error, and what is the optimum (ML or MAP) decoder. On the special case of coding for the binary symmetric channel (BSC), we showcase the evolution of our understanding of fundamental limits from pre-Shannon’s to modern finite blocklength. We also briefly review the history of ECCs. We conclude with a conceptually important proof of a weak converse (impossibility) bound for the performance of ECCs.

33 - Strong Data-Processing Inequality
from Part VI - Statistical Applications
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 642-676
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 33 introduces the strong data-processing inequalities (SDPIs), which are quantitative strengthening of the DPIs in Part I. As applications we show how to apply SDPI to deduce lower bounds for various estimation problems on graphs or in distributed settings. The purpose of this chapter is two-fold. First, we want to introduce general properties of the SDPI coefficients. Second, we want to show how SDPIs help prove sharp lower (impossibility) bounds on statistical estimation questions. The flavor of the statistical problems in this chapter is different from the rest of the book in that here the information about the unknown parameter θ is either more “thinly spread” across a high-dimensional vector of observations than in classical X = θ + Z type of models (see spiked Wigner and tree-coloring examples), or distributed across different terminals (as in correlation and mean estimation examples).

Exercises for Part II
from Part II - Lossless Data Compression
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 267-276
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. The book introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite blocklength approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning, and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC-Bayes and variational principle, Kolmogorov’s metric entropy, strong data-processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by additional stand-alone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.

32 - Entropic Bounds for Statistical Estimation
from Part VI - Statistical Applications
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 617-641
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

So far our discussion on information-theoretic methods has been mostly focused on statistical lower bounds (impossibility results), with matching upper bounds obtained on a case-by-case basis. In Chapter 32 we will discuss three information-theoretic upper bounds for statistical estimation under KL divergence (Yang–Barron), Hellinger (Le Cam–Birgé), and total variation (Yatracos) loss metrics. These three results apply to different loss functions and are obtained using completely different means. However, they take on exactly the same form, involving the appropriate metric entropy of the model. In particular, we will see that these methods achieve minimax optimal rates for the classical problem of density estimation under smoothness constraints.

Frequently Used Notation
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 1-4
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. The book introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite blocklength approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning, and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC-Bayes and variational principle, Kolmogorov’s metric entropy, strong data-processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by additional stand-alone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.

29 - Classical Large-Sample Asymptotics
from Part VI - Statistical Applications
Yury Polyanskiy, Massachusetts Institute of Technology, Yihong Wu, Yale University, Connecticut
Book:

Information Theory

Published online:

09 January 2025

Print publication:

02 January 2025, pp 588-598
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 29 gives an exposition of the classical large-sample asymptotics for smooth parametric models in fixed dimensions, highlighting the role of Fisher information introduced in Chapter 2. Notably, we discuss how to deduce classical lower bounds (Hammersley–Chapman–Robbins, Cramér–Rao, van Trees) from the variational characterization and the data-processing inequality (DPI) of χ2-divergence in Chapter 7.

Communications and signal processing

Refine search

Refine search

Actions for selected content:

6790 results in Communications and signal processing

Information Theory

10 - Variable-Length Compression

Summary

16 - Hypothesis Testing: Error Exponents

Summary

Introduction

Part II - Lossless Data Compression

24 - Rate-Distortion Theory

Summary

Contents

4 - Variational Characterizations and Continuity of Information Measures

Summary

Index

23 - Channel Coding with Feedback

Summary

21 - Capacity per Unit Cost

Summary

25 - Rate Distortion: Achievability Bounds

Summary

Exercises for Part VI

Summary

27 - Metric Entropy

Summary

17 - Error-Correcting Codes

Summary

33 - Strong Data-Processing Inequality

Summary

Exercises for Part II

Summary

32 - Entropic Bounds for Statistical Estimation

Summary

Frequently Used Notation

Summary

29 - Classical Large-Sample Asymptotics

Summary

Communications and signal processing

Refine search

Refine search

Actions for selected content:

Save Search

6790 results in Communications and signal processing

Information Theory

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary