This part contains two chapters concerning reduction of the dimension of the feature space, which plays a vital role in improving learning efficiency as well as prediction performance.
Chapter 3 covers the most prominent subspace projection approach, namely the classical principal component analysis (PCA), cf. Algorithm 3.1. Theorems 3.1 and 3.2 establish the optimality of PCA for both the minimum reconstruction error and maximum entropy criteria. The optimal error and entropy attainable by PCA are given in closed form. Algorithms 3.2, 3.3, and 3.4 describe the numerical procedures for the computation of PCA via the data matrix, scatter matrix, and kernel matrix, respectively.
Given a finite training dataset, the PCA learning model meets the LSP condition, thus the conventional PCA model can be kernelized. When a nonlinear kernel is adopted, it further extends to the kernel-PCA (KPCA) learning model. The KPCA algorithms can be presented in intrinsic space or empirical space (see Algorithms 3.5 and 3.6). For several real-life datasets, visualization via KPCA shows more visible data separability than that via PCA. Moreover, KPCA is closely related to the kernel-induced spectral space, which proves instrumental for error analysis in unsupervised and supervised applications.
Chapter 4 explores various aspects of feature selection methods for supervised and unsupervised learning scenarios. It presents several filtering-based and wrapper-based methods for feature selection, a popular method for dimension reduction.