To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Discover the foundations of classical and quantum information theory in the digital age with this modern introductory textbook. Familiarise yourself with core topics such as uncertainty, correlation, and entanglement before exploring modern techniques and concepts including tensor networks, quantum circuits and quantum discord. Deepen your understanding and extend your skills with over 250 thought-provoking end-of-chapter problems, with solutions for instructors, and explore curated further reading. Understand how abstract concepts connect to real-world scenarios with over 400 examples, including numerical and conceptual illustrations, and emphasising practical applications. Build confidence as chapters progressively increase in complexity, alternating between classic and quantum systems. This is the ideal textbook for senior undergraduate and graduate students in electrical engineering, computer science, and applied mathematics, looking to master the essentials of contemporary information theory.
Emphasizing how and why machine learning algorithms work, this introductory textbook bridges the gap between the theoretical foundations of machine learning and its practical algorithmic and code-level implementation. Over 85 thorough worked examples, in both Matlab and Python, demonstrate how algorithms are implemented and applied whilst illustrating the end result. Over 75 end-of-chapter problems empower students to develop their own code to implement these algorithms, equipping them with hands-on experience. Matlab coding examples demonstrate how a mathematical idea is converted from equations to code, and provide a jumping off point for students, supported by in-depth coverage of essential mathematics including multivariable calculus, linear algebra, probability and statistics, numerical methods, and optimization. Accompanied online by instructor lecture slides, downloadable Python code and additional appendices, this is an excellent introduction to machine learning for senior undergraduate and graduate students in Engineering and Computer Science.
'High-Dimensional Probability,' winner of the 2019 PROSE Award in Mathematics, offers an accessible and friendly introduction to key probabilistic methods for mathematical data scientists. Streamlined and updated, this second edition integrates theory, core tools, and modern applications. Concentration inequalities are central, including classical results like Hoeffding's and Chernoff's inequalities, and modern ones like the matrix Bernstein inequality. The book also develops methods based on stochastic processes – Slepian's, Sudakov's, and Dudley's inequalities, generic chaining, and VC-based bounds. Applications include covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, and machine learning. New to this edition are 200 additional exercises, alongside extra hints to assist with self-study. Material on analysis, probability, and linear algebra has been reworked and expanded to help bridge the gap from a typical undergraduate background to a second course in probability.
Regression and classification are closely related, as shown in this chapter, which discusses methods used to map a linear regression function into a probablity function by either logistic function (for binary classification) or softmax function (for multi-class classification). According to this probablity function, an unlabeled sample can be assigned to one of the classes. The optimal model parameters in this method can be obtained based on the training set so that either the likelihood or the posterior probability of these parameters are maximized.
This chapter offers a comprehensive overview of large language models (LLMs), examining their theoretical foundations, core mechanisms, and broad-ranging implications. We begin by situating LLMs within the domain of natural language processing (NLP), tracing the evolution of language modeling from early statistical approaches to modern deep learning methods.</p>The focus then shifts to the transformative impact of the Transformer architecture, introduced in the seminal paper Attention Is All You Need. By leveraging self-attention and parallel computation, Transformers have enabled unprecedented scalability and efficiency in training large models.</p>We explore the pivotal role of transfer learning in NLP, emphasizing how pretraining on large text corpora followed by task-specific fine-tuning allows LLMs to generalize across a wide range of linguistic tasks. The chapter also discusses reinforcement learning with human feedback (RLHF)—a crucial technique for refining model outputs to better align with human preferences and values.</p>Key theoretical developments are introduced, including scaling laws, which describe how model performance improves predictably with increased data, parameters, and compute resources, and emergence, the surprising appearance of complex behaviors in sufficiently large models.</p>Beyond technical aspects, the chapter engages with deeper conceptual questions: Do LLMs genuinely "understand" language? Could advanced AI systems one day exhibit a form of consciousness, however rudimentary or speculative? These discussions draw from perspectives in cognitive science, philosophy of mind, and AI safety.</p>Finally, we explore future directions in the field, including the application of Transformer architectures beyond NLP, and the development of generative methods that extend beyond Transformer-based models, signaling a dynamic and rapidly evolving landscape in artificial intelligence.
This chapter is concerned with the constrained optimization problem which plays an important role in ML as many ML algorithms are essentially to maximize or minimize a given objective funcion with either equality or inequality constraints. Such kind of constrained optimization problems can be reformulated in terms of the Lagrangian function including an extra tern for the constraints weighted by their Lagrange multipliers as well as the original function. The chapter also consider the important duality principle based on which the constrained optimization problem can be addressed as either the primal (original) problem, or the dual problem, which is equivalent to the primal if a set of KKT conditions are satisfied, in the sense that the solution of the dual is the same as that for the primal. The chapter further considers two methods, linear and quadratic programming, of which the latter is the foundation for support vector machine (SVM), an important classification algorithm to be considered in a later chapter.
This chapter considers unsupervised learning methods for clustering analysis when the data samples in the given dataset are no longer labeled, including the K-means method and Gaussian mixture model. The K-means algorithm is straight forward in theory and simple to implement. Based on a set of K randomly initialized seeds assumed to be the mean vectors of some K clusters, the algorithm keeps on modifying them iteratively until they become stabilized. The drawback of this method is that the resulting clusters are only characterized by their means, while the shapes of their distribution are not considered. If the distributions of the actual clusters in the dataset are not spherical, they will not be properly represented. This problem can be addressed if the dataset is modeled as a mixture of Gaussian distributions, each characterized by its means and covariance, which are to be estimated iteratively by the expectation maximization (EM) method. The resulting Gaussian clusters reveal the structure of the dataset much more accurately. The k-means and Gaussian mixture methods are analogous, respectively, to the discriminative minimum-distance classifier and the generative Bayesian classifier. Following the same idea of GMM, the last section of this chapter also considers the algorithm of Bernoulli mixture model for clustering of binary data.
This chapter reviews the basic numerical methods for solving equation systems, including fixed-point iteration that will be used while discussing reinforcement learning, and the Newton-Raphson method for solving both univariate and multivariate systems, which is closely related to methods for solving optimization problems to be discussed in the following chapters. Newton’s method is based on the approximation of the function in question by the first two constant and linear terms of its Taylor expansion at an initial guess of the root, which is then iteratively improved to approach the true root where the function is equal to zero. The appendices of the chapter further discuss some important computational issues such as the order of convergence of these methods which may be of interest to more advanced readers.
This chapter considers a set of algorithms for statistic pattern classification, including two simple classifiers based on nearest neighbors and minimum distances, and two more powerful methods of naïve Bayes and adaptive boosting (AdaBoost). The Bayes classifier is a typical generative method based on the assumption that in the training set all data points of the same class are samples from the same Gaussian distribution, and, it classifies any unlabeled data samples into one of the classes with the highest posterior probability of the class given the sample, proportional to the product of the likelihood and prior probability. Differently, the AdaBoost classifier is a typical boosting algorithm (ensemble learning) that iteratively improves a set of weak classifiers.