To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
At the forefront of cutting-edge technologies, this text provides a comprehensive treatment of a crucial network performance metric, ushering in new opportunities for rethinking the whole design of communication systems. Detailed exposition of the communication and network theoretic foundations of Age of Information (AoI) gives the reader a solid background, and discussion of the implications on signal processing and control theory shed light on the important potential of recent research. The text includes extensive real-world applications of this vital metric, including caching, the Internet of Things (IoT), and energy harvesting networks. The far-reaching applications of AoI include networked monitoring systems, cyber-physical systems such as the IoT, and information-oriented systems and data analytics applications ranging from the stock market to social networks. The future of this exciting subject in 5G communication systems and beyond make this a vital resource for graduate students, researchers and professionals.
This self-contained introduction to machine learning, designed from the start with engineers in mind, will equip students with everything they need to start applying machine learning principles and algorithms to real-world engineering problems. With a consistent emphasis on the connections between estimation, detection, information theory, and optimization, it includes: an accessible overview of the relationships between machine learning and signal processing, providing a solid foundation for further study; clear explanations of the differences between state-of-the-art techniques and more classical methods, equipping students with all the understanding they need to make informed technique choices; demonstration of the links between information-theoretical concepts and their practical engineering relevance; reproducible examples using Matlab, enabling hands-on student experimentation. Assuming only a basic understanding of probability and linear algebra, and accompanied by lecture slides and solutions for instructors, this is the ideal introduction to machine learning for engineering students of all disciplines.
We continue our discussion of hidden Markov models (HMMs) and consider in this chapter the solution of decoding problems. Specifically, given a sequence of observations , we would like to devise mechanisms that allow us to estimate the underlying sequence of state or latent variables . That is, we would like to recover the state evolution that “most likely” explains the measurements. We already know how to perform decoding for the case of mixture models with independent observations by using (38.12a)–(38.12b). The solution is more challenging for HMMs because of the dependency among the states.
The various reinforcement learning algorithms described in the last two chapters rely on estimating state values, , or state–action values, , directly.
One prominent application of the variational inference methodology of Chapter 36 arises in the context of topic modeling. In this application, the objective is to discover similarities between texts or documents such as news articles. For example, given a large library of articles, running perhaps into the millions, such as a database of newspaper articles written over 100 years, it would be useful to be able to discover in an automated manner the multitude of topics that are covered in the database and to cluster together articles dealing with similar topics such as sports or health or politics. In another example, when a user is browsing an article online, it would be useful to be able to identify automatically the subject matter of the article in order to recommend to the reader other articles of similar content. Latent Dirichlet allocation (or LDA) refers to the procedure that results from applying variational inference techniques to topic modeling in order to address questions of this type.
This chapter summarizes recent advances on the analysis of the optimization landscape of neural network training. We first review classical results for linear networks trained with a squared loss and without regularization. Such results show that under certain conditions on the input-output data spurious local minima are guaranteed not to exist, i.e. critical points are either saddle points or global minima. Moreover, the globally optimal weights can be found by factorizing certain matrices obtained from the input-output covariance matrices.We then review recent results for deep networks with parallel structure, positively homogeneous network mapping and regularization, and trained with a convex loss. Such results show that the non-convex objective on theweights can be lower-bounded by a convex objective on the network mapping. Moreover, when the network is sufficiently wide, local minima of the non-convex objective that satisfy a certain condition yield global minima of both the non-convex and convex objectives, and that there is always a non-increasing path to a global minimizer from any initialization.
In this chapter we discuss the algorithmic and theoretical underpinnings of layer-wise relevance propagation (LRP), apply the method to a complex model trained for the task of visual question answering (VQA), and demonstrate that it produces meaningful explanations, revealing interesting details about the model’s reasoning. We conclude the chapter by commenting on the general limitations of current explanation techniques and interesting future directions.
The maximum-likelihood (ML) formulation is one of the most formidable tools for the solution of inference problems in modern statistical analysis. It allows the estimation of unknown parameters in order to fit probability density functions (pdfs) onto data measurements. We introduce the ML approach in this chapter and limit our discussions to properties that will be relevant for the future developments in the text. The presentation is not meant to be exhaustive, but targets key concepts that will be revisited in later chapters. We also avoid anomalous situations and focus on the main features of ML inference that are generally valid under some reasonable regularity conditions.
The temporal learning algorithms TD(0) and TD() of the previous chapter are useful procedures for state value evaluation; i.e., they permit the estimation of the state value function for a given target policy by observing actions and rewards arising from this policy (on‐policy learning) or another behavior policy (off‐policy learning).In most situations, however, we are not interested in state values but rather in determining optimal policies, denoted by (i.e., in selecting what optimal actions an agent should follow in a Markov decision process (MDP)).
We have described a range of supervised learning algorithms in the previous chapters, including several neural network implementations and their training by means of the backpropagation algorithm. The performance of some of these algorithms has been demonstrated in practice to match or even exceed human performance in important applications. At the same time, it has also been observed that the algorithms are susceptible to adversarial attacks that can drive them to erroneous decisions under minimal perturbations to the data. For instance, adding small perturbations to an image that may not even be perceptible to the human eye has been shown to cause learning algorithms to classify the image incorrectly.
We expectation-maximization (EM) algorithm can be used to estimate the underlying parameters of the conditional probability density functions (pdfs) by approximating the maximum-likelihood (ML) solution. We found that the algorithm operates on a collection of independent observations, where each observation is generated independently from one of the mixture components. In this chapter and the next, we extend this construction and consider hidden Markov models (HMMs), where the mixture component for one observation is now dependent on the component used to generate the most recent past observation.
Oftentimes, the dimension of the feature space, , is prohibitively large either for computational or visualization purposes. In these situations, it becomes necessary to perform an initial dimensionality reduction step where each is replaced by a lower-dimensional vector with .
Maximum likelihood (ML) is a powerful statistical tool that determines model parameters in order to fit probability density functions (pdfs) onto data measurements. The estimated pdfs can then be used for at least two purposes. First, they can help construct optimal estimators or classifiers (such as the conditional mean estimator, the maximum a-posteriori (MAP) estimator, or the Bayes classifier) since, as we already know from previous chapters, these optimal constructions require knowledge of the conditional or joint probability distributions of the variables involved in the inference problem. Second, once a pdf is learned, we can sample from it to generate additional observations. For example, consider a database consisting of images of cats and assume we are able to characterize (or learn) the pdf distribution of the pixel values in these images. Then, we could use the learned pdf to generate “fake” cat-like images (i.e., ones that look like real cats). We will learn later in this text that this construction is possible and some machine-learning architectures are based on this principle: They use data to learn what we call a “generative model,” and then use the model to generate “similar” data. We provide a brief explanation to this effect in the next section, where we explain the significance of posterior distributions.