To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapter 19 we apply methods developed in the previous chapters (namely the weak converse and the random/maximal coding achievability) to compute the channel capacity. This latter notion quantifies the maximal amount of (data) bits that can be reliably communicated per single channel use in the limit of using the channel many times. Formalizing the latter statement will require introducing the concept of a communication channel. Then for special kinds of channels (the memoryless and the information-stable ones) we will show that computing the channel capacity reduces to maximizing the (sequence of the) mutual information. This result, known as Shannon’s noisy channel coding theorem, is very special as it relates the value of a (discrete, combinatorial) optimization problem over codebooks to that of a (convex) optimization problem over information measures. It builds a bridge between the abstraction of information measures (Part I) and practical engineering problems.
Chapter 1 introduces the first information measure – Shannon entropy. After studying its standard properties (chain rule, conditioning), we will briefly describe how one could arrive at its definition. We discuss axiomatic characterization, the historical development in statistical mechanics, as well as the underlying combinatorial foundation (“method of types”). We close the chapter with Han’s and Shearer’s inequalities, which both exploit the submodularity of entropy.
Chapter 2 is a study of divergence (also known as information divergence, Kullback–Leibler (KL) divergence, relative entropy), which is the first example of a dissimilarity (information) measure between a pair of distributions P and Q. Defining KL divergence and its conditional version in full generality requires some measure-theoretic acrobatics (Radon–Nikodym derivatives and Markov kernels) that we spend some time on. (We stress again that all this abstraction can be ignored if one is willing to work only with finite or countably infinite alphabets.) Besides definitions we prove the “main inequality” showing that KL divergence is non-negative. Coupled with the chain rule for divergence, this inequality implies the data-processing inequality, which is arguably the central pillar of information theory and this book. We conclude the chapter by studying the local behavior of divergence when P and Q are close. In the special case when P and Q belong to a parametric family, we will see that divergence is locally quadratic, with Hessian being the Fisher information, explaining the fundamental role of the latter in classical statistics.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. The book introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite blocklength approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning, and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC-Bayes and variational principle, Kolmogorov’s metric entropy, strong data-processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by additional stand-alone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
In Chapter 13 we will discuss how to produce compression schemes that do not require a priori knowledge of the generative distribution. It turns out that designing a compression algorithm able to adapt to an unknown distribution is essentially equivalent to the problem of estimating an unknown distribution, which is a major topic of statistical learning. The plan for this chapter is as follows: (1) We will start by discussing the earliest example of a universal compression algorithm (of Fitingof). It does not talk about probability distributions at all. However, it turns out to be asymptotically optimal simultaneously for all iid distributions and with small modifications for all finite-order Markov chains. (2) The next class of universal compressors is based on assuming that the true distribution belongs to a given class. These methods proceed by choosing a good model distribution serving as the minimax approximation to each distribution in the class. The compression algorithm for a single distribution is then designed as in previous chapters. (3) Finally, an entirely different idea are algorithms of Lempel–Ziv type. These automatically adapt to the distribution of the source, without any prior assumptions required.
In this chapter we introduce the problem of analyzing low-probability events, known as large deviation theory. It is usually solved by computing moment-generating functions and Fenchel-Legendre conjugation. It turns out, however, that these steps can be interpreted information-theoretically in terms of information projection. We show how to solve information projection in a special case of linear constraints, connecting the solution to exponential families.
In Chapter 20 we study data transmission with constraints on the channel input. For example, how many bits per channel use can we transmit under constraints on the codewords? To answer this question in general, we need to extend the setup and coding theorems to channels with input constraints. After doing that we will apply these results to compute the capacities of various Gaussian channels (memoryless, with intersymbol interference and subject to fading).
This study introduces an innovative methodology for mortality forecasting, which integrates signature-based methods within the functional data framework of the Hyndman–Ullah (HU) model. This new approach, termed the Hyndman–Ullah with truncated signatures (HUts) model, aims to enhance the accuracy and robustness of mortality predictions. By utilizing signature regression, the HUts model is able to capture complex, nonlinear dependencies in mortality data which enhances forecasting accuracy across various demographic conditions. The model is applied to mortality data from 12 countries, comparing its forecasting performance against variants of the HU models across multiple forecast horizons. Our findings indicate that overall the HUts model not only provides more precise point forecasts but also shows robustness against data irregularities, such as those observed in countries with historical outliers. The integration of signature-based methods enables the HUts model to capture complex patterns in mortality data, making it a powerful tool for actuaries and demographers. Prediction intervals are also constructed with bootstrapping methods.
This paper introduces a novel theoretical framework that offers a closed-form expression for the tail variance (TV) for the novel family of generalised hyper-elliptical (GHE) distributions. The GHE family combines an elliptical distribution with the generalised inverse Gaussian (GIG) distribution, resulting in a highly adaptable and powerful model. Expanding upon the findings of Ignatieva and Landsman ((2021) Insurance: Mathematics and Economics, 101, 437–465.) regarding the tail conditional expectation (TCE), this study demonstrates the significance of the TV as an additional risk measure that provides valuable insights into the tail risk and effectively captures the variability within the loss distribution’s tail. To validate the theoretical results, we perform an empirical analysis on two specific cases: the Laplace – GIG and the Student-t – GIG mixtures. By incorporating the TV derived for the GHE family, we are able to quantify correlated risks in a multivariate portfolio more efficiently. This contribution is particularly relevant to the insurance and financial industries, as it offers a reliable method for accurately assessing the risks associated with extreme losses. Overall, this paper presents an innovative and rigorous approach that enhances our understanding of risk assessment within the financial and insurance sectors. The derived expressions for the TV in addition to TCE within the GHE family of distributions provide valuable insights and practical tools for effectively managing risk.
Engineering machines are becoming increasingly complex and possess more control variables, increasing the complexity and versatility of the control systems. Different configurations of the control system, named a policy, can result in similar output behavior but with different resource or component life usage. There is therefore an opportunity to find optimal policies with respect to economic decisions. While many solutions have been proposed to find such economic policy decisions at the asset level, we consider this problem at the fleet level. In this case, the optimal operation of each asset is affected by the state of all other assets in the fleet. Challenges introduced by considering multiple assets include the construction of economic multi-objective optimization criteria, handling rare events such as failures, application of fleet-level constraints, and scalability. The proposed solution presents a framework for economic fleet optimization. The framework is demonstrated for economic criteria relating to resource usage, component lifing, and maintenance scheduling, but is generically extensible. Direct optimization of lifetime distributions is considered in order to avoid the computational burden of discrete event simulation of rare events. Results are provided for a real-world case study targeting the optimal economic operation of a fleet of aerospace gas turbine engines.
This paper proposes to solve the vortex gust mitigation problem on a 2D, thin flat plate using onboard measurements. The objective is to solve the discrete-time optimal control problem of finding the pitch rate sequence that minimizes the lift perturbation, that is, the criterion where is the lift coefficient obtained by the unsteady vortex lattice method. The controller is modeled as an artificial neural network, and it is trained to minimize using deep reinforcement learning (DRL). To be optimal, we show that the controller must take as inputs the locations and circulations of the gust vortices, but these quantities are not directly observable from the onboard sensors. We therefore propose to use a Kalman particle filter (KPF) to estimate the gust vortices online from the onboard measurements. The reconstructed input is then used by the controller to calculate the appropriate pitch rate. We evaluate the performance of this method for gusts composed of one to five vortices. Our results show that (i) controllers deployed with full knowledge of the vortices are able to mitigate efficiently the lift disturbance induced by the gusts, (ii) the KPF performs well in reconstructing gusts composed of less than three vortices, but shows more contrasted results in the reconstruction of gusts composed of more vortices, and (iii) adding a KPF to the controller recovers a significant part of the performance loss due to the unobservable gust vortices.