To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Possibilistic logic programs (poss-programs) under stable models are a major variant of answer set programming. While its semantics (possibilistic stable models) and properties have been well investigated, the problem of inductive reasoning has not been investigated yet. This paper presents an approach to extracting poss-programs from a background program and examples (parts of intended possibilistic stable models). To this end, the notion of induction tasks is first formally defined, its properties are investigated and two algorithms ilpsm and ilpsmmin for computing induction solutions are presented. An implementation of ilpsmmin is also provided and experimental results show that when inputs are ordinary logic programs, the prototype outperforms a major inductive learning system for normal logic programs from stable models on the datasets that are randomly generated.
This chapter discusses the most basic frequentist method, the linear least squares (LLS) regression, for obtaining the optimal weights for the linear regression model that minimizes the squared error between the model prediction and observation. The chapter also considers how the goodness of its results can be evaluated quantitatively by the coefficient of determination (R-squared). The chapter then further discusses some variations of LLS, including ridge regression with an extra regularization term to make proper tradeoff between overfitting and underfitting, and linear method based on basis functions for nonlinear regression problems. Finally the last section of the chapter briefly discusses the Bayes methods that maximizes the likelihood and the posterior probability of the parameters of the linear regression model. This section serves to prepare the reader for discussion of various Bayesian learning algorithms in future chapters.
This chapter discusses the method of principal component analysis (PCA) for dimensionality reduction, by which the original high-dimensional feature space can be mapped into a much lower dimensional space still containing most of the separability information, based on either the total scatter matrix of the given dataset, or the within and/or between-class scatter matricies. This transformation from high to low dimensional space can be considered as a pre-processing stage before the main process for classification which can be carried out more efficiently and effectively in the low-dimensional space after the transformation.
This chapter is dedicated to the method of independent component analysis (ICA), which can be considered to be in parallel with PCA, as both methods are for the purpose of extracting some essential information, either the principal or independent components, from the given dataset, to be further processed. However, different from PCA, the ICA assumes the signals in the given data are linear combinations of a set of independent signal components (therefore also the name blind source separation or BSS), which can be recovered based on the fact that a linear combination of multiple random variables is more Gaussian than each of them individually. The ICA is therefore carried out based on the ICA is therefore carried out based on the principle of maximizing non-Gaussianity, often using measures such as kurtosis or negentropy to identify statistically independent components.
This chapter introduces a set of distances and scatter matrices of various kinds used to measure the difference or similarity between two sample points, one sample and one class/cluster, and two classes/clusters, and the within, between, and total scatteredness of classes/clusters, for the purpose of further measuring the separability of the classes/clusters in a subspace composed of features that are either selected or extracted from the original high dimensional feature space .
This chapter introduces two abstract logic programming languages, one based on first-order Horn clauses (fohc) and another on first-order hereditary Harrop formulas (fohh). It shows how these can be understood within the framework of uniform proofs in intuitionistic logic. The chapter discusses the limitations of these languages. It also explores the concept of focused proofs as a way to structure proof search. The chapter proves the completeness of focused proofs for a fragment of intuitionistic logic. Bibliographic notes provide references to related work in logic programming theory.
This chapter intruduces an important idea of kernel mapping, which can map the feature space to a much higher dimensional space where the class separability could be improved significantly for better classification results. Based on the assumption that all data samples only appear in the form of inner product in the algorithm, kernel mapping is actually carried out implicitly, in the sense that the mapping function never needs to be explicitly specified. The chapter then introduces the method of kernel PCA, as a variant of PCA, together with another variant probabilistic PCA. The chapter further considers the method of factor analysis based on two important concepts of latent variables and expectation maximization (EM), both playing some important roles in other learning algorithms to be discussed in future chapters. Finally the chapter moves on to discuss two additional methods, multidimensional scaling (MDS) and t-distributed stochastic neighbor embedding (t-SNE), for the same general purpose of dimensionality reduction.
This chapter discusses both supervised and unsupervised algorithms all to be carried out in a tree-like hierarchy, in which a classification or clustering problem is solved in a divide-and-conquer manner while traversing a binary tree. For supervised classification, the tree classifier is first constructed in the training phase, and then in the test phase, a set of classes are subdivided into two subsets at each node of the tree based on a subset of features specifically selected to best separate the two subsets. This operation is carried out along a path in the tree from the root node down to one of the leaf nodes representing one of the classes. For unsupervised clustering, the tree structure is constructed in either a top-down or ottom-up fashion. In the former case, the given dataset represented by root node is recursively split into two subsets represented by the two child nodes; while in the latter case, all samples each represented by one of the leaf nodes are merged sequentially until they form a single group at the root node. In either case, the splitting or merging is carried out based on certain distance previously considered. Such splitting or merging process can be truncated somewhere between the root and leaf nodes to obtain a set of clusters.
This chapter discusses nonlinear regression method based on gradient descent and its variations for obtaining the optimal parameters of any given nonlinear regression function.
This chapter focuses on specifying computations using multisets within a logic programming framework. It illustrates this by encoding numerals, letters, and words as multisets. The chapter provides examples of encoding computational models such as finite automata and pushdown automata using linear logic and multisets. It also touches upon the properties of these encodings. Bibliographic notes point to related work on multiset rewriting and its applications.
This chapter is dedicated to the sole topic of support vector machine (SVM), a typical discriminative algorithm mostly for binary classification. The goal of the algorithm is to find a optimal hyperplane that separate the two classes (assumed to be linearly separable) in the feature space in such a way that the two classes are best separated, in the sense that the distances (called margin) between the plane and the samples closest to it (called support vectors) on either side of the plane are maximized. This is a constrained optimization problem which could be solved directly, but it is actually first converted to its dual problem and then solved by quadratis programming. The reason for solving the dual problem is due to the fact that all data points appear in the form of inner product, so that kernel method can be used to carry out the classification in a higher dimensional space in which the two classes become linearly separable even if they are not so in the original space. The chapter further considers some variants of SVM, such as sequential minimal optimization and generalized multiclass SVM.
This chapter lays the syntactic foundations for the book, covering topics in both first-order and higher-order logic. It introduces untyped lambda-terms and their properties, including beta-conversion and beta-normal forms. The chapter then defines types, signatures, and typed terms, restricting the typing judgment to beta-normal formulas. Finally, it introduces the concept of formulas and sequents, which are central to the proof-theoretic approach discussed in the book. Bibliographic notes provide references for further reading on these foundational concepts.