To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces the sequent calculus as a formal system for constructing proofs and as a basis for automated proof search. It contrasts the book’s approach with Gentzen’s original formulation. The chapter details the different types of inference rules within the sequent calculus, including structural rules (weakening, contraction), identity rules (initial, cut), and introduction rules for logical connectives. It distinguishes between additive and multiplicative rules. Finally, it defines sequent calculus proofs as trees of inference rules and briefly touches upon the properties of cut elimination. Bibliographic notes point to relevant literature.
Regression and classification are closely related, as shown in this chapter, which discusses methods used to map a linear regression function into a probablity function by either logistic function (for binary classification) or softmax function (for multi-class classification). According to this probablity function, an unlabeled sample can be assigned to one of the classes. The optimal model parameters in this method can be obtained based on the training set so that either the likelihood or the posterior probability of these parameters are maximized.
This chapter offers a comprehensive overview of large language models (LLMs), examining their theoretical foundations, core mechanisms, and broad-ranging implications. We begin by situating LLMs within the domain of natural language processing (NLP), tracing the evolution of language modeling from early statistical approaches to modern deep learning methods.</p>The focus then shifts to the transformative impact of the Transformer architecture, introduced in the seminal paper Attention Is All You Need. By leveraging self-attention and parallel computation, Transformers have enabled unprecedented scalability and efficiency in training large models.</p>We explore the pivotal role of transfer learning in NLP, emphasizing how pretraining on large text corpora followed by task-specific fine-tuning allows LLMs to generalize across a wide range of linguistic tasks. The chapter also discusses reinforcement learning with human feedback (RLHF)—a crucial technique for refining model outputs to better align with human preferences and values.</p>Key theoretical developments are introduced, including scaling laws, which describe how model performance improves predictably with increased data, parameters, and compute resources, and emergence, the surprising appearance of complex behaviors in sufficiently large models.</p>Beyond technical aspects, the chapter engages with deeper conceptual questions: Do LLMs genuinely "understand" language? Could advanced AI systems one day exhibit a form of consciousness, however rudimentary or speculative? These discussions draw from perspectives in cognitive science, philosophy of mind, and AI safety.</p>Finally, we explore future directions in the field, including the application of Transformer architectures beyond NLP, and the development of generative methods that extend beyond Transformer-based models, signaling a dynamic and rapidly evolving landscape in artificial intelligence.
Although deep reinforcement learning (DRL) techniques have been extensively studied in the field of robotic manipulators, there is limited research on directly mapping the output of policy functions to the joint space of manipulators. This paper proposes a motion planning scheme for redundant manipulators to avoid obstacles based on DRL, considering the actual shapes of obstacles in the environment. This scheme not only accomplishes the path planning task for the end-effector but also enables autonomous obstacle avoidance while obtaining the joint trajectories of the manipulator. First, a reinforcement learning framework based on the joint space is proposed. This framework uses the joint accelerations of the manipulator to calculate the Cartesian coordinates of the end-effector through forward kinematics, thereby performing end-to-end path planning for the end-effector. Second, the distance between all the linkages of the manipulator and irregular obstacles is calculated in real time based on the Gilbert–Johnson–Keerthi distance algorithm. The reward function containing joint acceleration is constructed with this distance to realize the obstacle avoidance task of the redundant manipulator. Finally, simulations and physical experiments were conducted on a 7-degree-of-freedom manipulator, demonstrating that the proposed scheme can generate efficient and collision-free trajectories in environments with irregular obstacles, effectively avoiding collisions.
This chapter is concerned with the constrained optimization problem which plays an important role in ML as many ML algorithms are essentially to maximize or minimize a given objective funcion with either equality or inequality constraints. Such kind of constrained optimization problems can be reformulated in terms of the Lagrangian function including an extra tern for the constraints weighted by their Lagrange multipliers as well as the original function. The chapter also consider the important duality principle based on which the constrained optimization problem can be addressed as either the primal (original) problem, or the dual problem, which is equivalent to the primal if a set of KKT conditions are satisfied, in the sense that the solution of the dual is the same as that for the primal. The chapter further considers two methods, linear and quadratic programming, of which the latter is the foundation for support vector machine (SVM), an important classification algorithm to be considered in a later chapter.
This chapter demonstrates how logic programming, particularly using linear logic, can be used to encode and reason about simple security protocols. It discusses specifying communication processes and protocols, including communication on a public network, static key distribution, and dynamic symbol creation. The chapter explores how to represent encrypted data as an abstract data type and model protocols as theories in linear logic. It also covers techniques for abstracting the internal states of agents and representing agents as nested implications. Bibliographic notes cite relevant work on formal methods for security protocol analysis.
This chapter introduces the idea that computation can be viewed and reasoned about through the lens of proof theory. It highlights the historical context of defining computation, noting the equivalence of formal systems like lambda-calculus and Turing machines. The chapter discusses the benefits of using logic to specify computations, emphasizing the universally accepted descriptions of logics, which can ensure the precision of meaning for logic programs. It also outlines the book’s structure, dividing it into two parts: the first covering the proof-theoretic foundations of logic programming languages, and the second exploring their applications. The chapter concludes with bibliographic notes.
This chapter presents sequent calculus proof systems for classical and intuitionistic logics, which are variations of Gentzen’s LK and LJ proof systems. It highlights the differences in their inference rules, particularly regarding the right-hand side of sequents. The chapter discusses the cut-elimination theorem for these logics and its significance. It also explores the increase in proof size that can result from eliminating cuts. Furthermore, the chapter considers the choices involved in proof search within these systems, distinguishing between don’t-know and don’t-care nondeterminism. Bibliographic notes direct the reader to relevant historical and contemporary works.
This chapter delves into the formal proof theory of linear logic focused proofs. It defines paths in linear logic formulas, using them to describe right-introduction and left-introduction phases. The chapter proves the admissibility of the non-atomic initial rule and demonstrates the elimination of cut rules in the focused system for linear logic. Finally, it establishes the completeness of the focused proof system with respect to the unfocused proof system for linear logic. Readers primarily interested in the applications of linear logic programming can skip this chapter.
This chapter considers unsupervised learning methods for clustering analysis when the data samples in the given dataset are no longer labeled, including the K-means method and Gaussian mixture model. The K-means algorithm is straight forward in theory and simple to implement. Based on a set of K randomly initialized seeds assumed to be the mean vectors of some K clusters, the algorithm keeps on modifying them iteratively until they become stabilized. The drawback of this method is that the resulting clusters are only characterized by their means, while the shapes of their distribution are not considered. If the distributions of the actual clusters in the dataset are not spherical, they will not be properly represented. This problem can be addressed if the dataset is modeled as a mixture of Gaussian distributions, each characterized by its means and covariance, which are to be estimated iteratively by the expectation maximization (EM) method. The resulting Gaussian clusters reveal the structure of the dataset much more accurately. The k-means and Gaussian mixture methods are analogous, respectively, to the discriminative minimum-distance classifier and the generative Bayesian classifier. Following the same idea of GMM, the last section of this chapter also considers the algorithm of Bernoulli mixture model for clustering of binary data.
This chapter reviews the basic numerical methods for solving equation systems, including fixed-point iteration that will be used while discussing reinforcement learning, and the Newton-Raphson method for solving both univariate and multivariate systems, which is closely related to methods for solving optimization problems to be discussed in the following chapters. Newton’s method is based on the approximation of the function in question by the first two constant and linear terms of its Taylor expansion at an initial guess of the root, which is then iteratively improved to approach the true root where the function is equal to zero. The appendices of the chapter further discuss some important computational issues such as the order of convergence of these methods which may be of interest to more advanced readers.