To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we discuss the decision-theoretic framework of statistical estimation and introduce several important examples. Section 28.1 presents the basic elements of statistical experiment and statistical estimation. Section 28.3 introduces the Bayes risk (average-case) and the minimax risk (worst-case) as the respective fundamental limits of statistical estimation in Bayesian and frequentist settings, with the latter being our primary focus in this part. We discuss several versions of the minimax theorem (and prove a simple one) that equates the minimax risk with the worst-case Bayes risk. Two variants are introduced next that extend a basic statistical experiment to either large sample size or large dimension: Section 28.4 on independent observations and Section 28.5 on tensorization of experiments. Throughout this chapter the Gaussian location model (GLM), introduced in Section 28.2, serves as a running example, with different focus at different places (such as the role of loss functions, parameter spaces, low versus high dimensions, etc.). In Section 28.6, we discuss a key result known as Anderson’s lemma for determining the exact minimax risk of (unconstrained) GLM in any dimension for a broad class of loss functions, which provides a benchmark for various more general techniques introduced in later chapters.
In Chapter 12, we shall examine results for a large class of processes with memory, known as ergodic processes. We start this chapter with a quick review of the main concepts of ergodic theory, then state our main results: Shannon–McMillan theorem, compression limit, and asymptotic equipartition property (AEP). Subsequent sections are dedicated to proofs of the Shannon–McMillan and ergodic theorems. Finally, in the last section we introduce Kolmogorov–Sinai entropy, which associates to a fully deterministic transformation the measure of how “chaotic” it is. This concept plays a very important role in formalizing an apparent paradox: large mechanical systems (such as collections of gas particles) are on the one hand fully deterministic (described by Newton’s laws of motion) and on the other hand have a lot of probabilistic properties (Maxwell distribution of velocities, fluctuations, etc.). Kolmogorov–Sinai entropy shows how these two notions can coexist. In addition it was used to resolve a long-standing open problem in dynamical systems regarding isomorphism of Bernoulli shifts.
In Chapter 14 we first define a performance metric giving a full description of the binary hypothesis testing (BHT) problem. A key result in this theory, the Neyman–Pearson lemma, determines the form of the optimal test and at the same time characterizes the given performance metric. We then specialize to the setting of iid observations and consider two types of asymptotics: Stein’s regime (where type-I error is held constant) and Chernoff’s regime (where errors of both types are required to decay exponentially). In this chapter we only discuss Stein's regime and find out that fundamental limit is given by the KL divergence. Subsequent chapters will address the Chernoff's regime.
In the previous chapter we introduced the concept of variable-length compression and studied its fundamental limits (with and without the prefix-free condition). In some situations, however, one may desire that the output of the compressor always has a fixed length, say, k bits. Unless k is unreasonably large, then, this will require relaxing the losslessness condition. This is the focus of Chapter 11: compression in the presence of (typically vanishingly small) probability of error. It turns out allowing even very small error enables several beautiful effects: The possibility to compress data via matrix multiplication over finite fields (linear compression). The possibility to reduce compression length if side information is available at the decompressor (Slepian–Wolf). The possibility to reduce compression length if access to a compressed representation of side information is available at the decompressor (Ahlswede–Körner–Wyner).
In Chapter 19 we apply methods developed in the previous chapters (namely the weak converse and the random/maximal coding achievability) to compute the channel capacity. This latter notion quantifies the maximal amount of (data) bits that can be reliably communicated per single channel use in the limit of using the channel many times. Formalizing the latter statement will require introducing the concept of a communication channel. Then for special kinds of channels (the memoryless and the information-stable ones) we will show that computing the channel capacity reduces to maximizing the (sequence of the) mutual information. This result, known as Shannon’s noisy channel coding theorem, is very special as it relates the value of a (discrete, combinatorial) optimization problem over codebooks to that of a (convex) optimization problem over information measures. It builds a bridge between the abstraction of information measures (Part I) and practical engineering problems.
In order to take on arbitrary geometries, shape-changing arrays must introduce gaps between their elements. To enhance performance, this unused area can be filled with meta-material inspired switched passive networks on flexible sheets in order to compensate for the effects of increased spacing. These flexible meta-gaps can easily fold and deploy when the array changes shape. This work investigates the promise of meta-gaps through the measurement of a 5-by-5 λ-spaced array with 40 meta-gap sheets and 960 switches. The optimization and measurement problems associated with such a high-dimensional phased array are discussed. Simulated and in-situ optimization experiments are conducted to examine the differential performance of metaheuristic algorithms and characterize the underlying optimization problem. Measurement results demonstrate that in our implementation meta-gaps increase the average main beam power within the field of view (FoV) by 0.46 dB, suppress the average side lobe level within the FoV by 2 dB, and enhance the field-of-view by 23.5∘ compared to a ground-plane backed array.
Chapter 1 introduces the first information measure – Shannon entropy. After studying its standard properties (chain rule, conditioning), we will briefly describe how one could arrive at its definition. We discuss axiomatic characterization, the historical development in statistical mechanics, as well as the underlying combinatorial foundation (“method of types”). We close the chapter with Han’s and Shearer’s inequalities, which both exploit the submodularity of entropy.
Chapter 2 is a study of divergence (also known as information divergence, Kullback–Leibler (KL) divergence, relative entropy), which is the first example of a dissimilarity (information) measure between a pair of distributions P and Q. Defining KL divergence and its conditional version in full generality requires some measure-theoretic acrobatics (Radon–Nikodym derivatives and Markov kernels) that we spend some time on. (We stress again that all this abstraction can be ignored if one is willing to work only with finite or countably infinite alphabets.) Besides definitions we prove the “main inequality” showing that KL divergence is non-negative. Coupled with the chain rule for divergence, this inequality implies the data-processing inequality, which is arguably the central pillar of information theory and this book. We conclude the chapter by studying the local behavior of divergence when P and Q are close. In the special case when P and Q belong to a parametric family, we will see that divergence is locally quadratic, with Hessian being the Fisher information, explaining the fundamental role of the latter in classical statistics.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. The book introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite blocklength approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning, and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC-Bayes and variational principle, Kolmogorov’s metric entropy, strong data-processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by additional stand-alone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
Tightly focused proton beams generated from helical coil targets have been shown to be highly collimated across small distances, and display characteristic spectral bunching. We show, for the first time, proton spectra from such targets at high resolution via a Thomson parabola spectrometer. The proton spectral peaks reach energies above 50 MeV, with cutoffs approaching 70 MeV and particle numbers greater than 10${}^{10}$. The spectral bunch width has also been measured as low as approximately 8.5 MeV (17% energy spread). The proton beam pointing and divergence measured at metre-scale distances are found to be stable with the average pointing stability below 10 mrad, and average half-angle beam divergences of approximately 6 mrad. Evidence of the influence of the final turn of the coil on beam pointing over long distances is also presented, corroborated by particle tracing simulations, indicating the scope for further improvement and control of the beam pointing with modifying target parameters.
In Chapter 13 we will discuss how to produce compression schemes that do not require a priori knowledge of the generative distribution. It turns out that designing a compression algorithm able to adapt to an unknown distribution is essentially equivalent to the problem of estimating an unknown distribution, which is a major topic of statistical learning. The plan for this chapter is as follows: (1) We will start by discussing the earliest example of a universal compression algorithm (of Fitingof). It does not talk about probability distributions at all. However, it turns out to be asymptotically optimal simultaneously for all iid distributions and with small modifications for all finite-order Markov chains. (2) The next class of universal compressors is based on assuming that the true distribution belongs to a given class. These methods proceed by choosing a good model distribution serving as the minimax approximation to each distribution in the class. The compression algorithm for a single distribution is then designed as in previous chapters. (3) Finally, an entirely different idea are algorithms of Lempel–Ziv type. These automatically adapt to the distribution of the source, without any prior assumptions required.
In this chapter we introduce the problem of analyzing low-probability events, known as large deviation theory. It is usually solved by computing moment-generating functions and Fenchel-Legendre conjugation. It turns out, however, that these steps can be interpreted information-theoretically in terms of information projection. We show how to solve information projection in a special case of linear constraints, connecting the solution to exponential families.
In Chapter 20 we study data transmission with constraints on the channel input. For example, how many bits per channel use can we transmit under constraints on the codewords? To answer this question in general, we need to extend the setup and coding theorems to channels with input constraints. After doing that we will apply these results to compute the capacities of various Gaussian channels (memoryless, with intersymbol interference and subject to fading).
A quadrotor unmanned aerial vehicle (UAV) must achieve desired flight missions despite internal uncertainties and external disturbances. This paper proposes an adaptive trajectory tracking control method that attenuates unknown uncertainties and disturbances. Although the quadrotor is underactuated, a fully actuated controller is designed using backstepping control. To avoid repeated derivatives of control inputs, a dynamic surface method introduces a filter and auxiliary controller. Lyapunov criteria guide adaptive laws for tuning controller gain and filters. A low-power observer is integrated for state estimation. Additionally, a disturbance observer is developed and combined with the control scheme to handle unknown disturbances. Simulations on a DJI F450 quadrotor demonstrate that the proposed control algorithm offers strong trajectory-tracking performance and system stability under multiple uncertainties and external disturbances during flight.
The transport industry of Ukraine is an integral part of its economy. According to the National Transport Strategy of Ukraine, a critical strategic goal is to enhance transport safety. Currently, there is a gap in mobile devices capable of automatically measuring slopes and evenness of both runways and road surfaces in two coordinates. This paper addresses the creation of new methods for assessing longitudinal and transverse slopes using micromechanical systems. The study highlights international experiences, presents practical applications and proposes strategies for overcoming implementation challenges. A detailed roadmap for deployment and further improvements is provided.
This paper analyses the performance of the Australian and New Zealand Satellite-Based Augmentation System (Aus-NZ SBAS) test-bed to evaluate its use in civil aviation applications with a focus on dual-frequency multi-constellation (DFMC) signals. The Aus-NZ SBAS test-bed performance metrics were determined using kinematic data recorded in flight across a variety of environments and operational conditions. A total of 14 tests adding up to 32 h of flight were evaluated. Flight test data were processed in both the L1 SBAS and DFMC SBAS modes supported by the test-bed broadcasts. The performance results are reviewed regarding accuracy, availability and integrity metrics and compared with the requirement thresholds defined by the International Civil Aviation Organisation (ICAO) for Precision Approach (PA) flight operations. The experimentation performed does not allow continuity assessment as specified in the standard due to a long-term statistical requirement and inherent limitations imposed by the reference station network. Analysis of flight test results shows that DFMC SBAS provides several performance improvements over single-frequency SBAS, tightening both horizontal and vertical protection levels and resulting in greater service availability during the approach.