To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Part II introduces domain-independent feature extraction methods, and this chapter presents principal component analysis (PCA). We start from its motivation, using an example. Then we gradually discover and develop the PCA algorithm: starting from zero dimensions, then one dimension, and finally the complete algorithm. We analyze its errors in ideal and practical conditions, and establish the equivalence between maximum variance and minimum reconstruction error. Two important issues are also discussed: when we can use PCA, and the relationship between PCA and SVD (singular value decomposition).
There is no silver bullet: no model can fit all data. Hence, special data requires special algorithms. In this chapter, we deal with two types of special data: sparse data and sequences that can be aligned to each other. We will not dive deep into sparsity learning, which is very complex. Rather, we introduce key concepts: sparsity inducing loss functions, dictionary learning, and what exactly the word sparsity means. For the second part in this chapter, we introduce dynamic time warping (DTW), which deals with sequences that can be aligned with each other (but there are sequences that cannot be aligned, which we will discuss in the next chapter). We use our old tricks: ideas, visualizations, formalizations, to reach the DTW solution. The key idea behind its success is divide-and-conquer and the key technology is dynamic programming.
The normal distribution is the most widely used continuous distribution, but many of its relevant properties are a little bit advanced for an undergraduate course. Hence, Part IV introduces some of these advanced topics. This chapter devotes itself to properties of normal distributions: single- and multivariate normal distributions, moment and canonical parameterizations, sum and product, geometry and the Mahalanobis distance, and conditional distributions. We also show that with these properties, some algorithms will become much easier to understand. We use parameter estimation and the Kalman filter as two such examples.
We cannot miss deep learning in a modern pattern recognition textbook, and we introduce CNN (convolutional neural networks) in this chapter. Although the mathematical derivation of CNN, especially the back-propagation process and gradient computation, is complex, we use a lot of useful tools to help readers understand what exactlyis going on in a CNN. Hence, this chapter focuses on accessibility rather than completeness. In its exercise problems, we introduce more relevant topics and methods.
Unlike PCA, which is unsupervised, FLD uses labels associated with data points, and no doubt it may get better linear features and accuracy than PCA. We start by illustrating this motivation, and practice the problem-solving framework by gradually developing the correct mathematical formulation behind the relatively simple idea behind Fisher's linear discriminant (FLD). We discuss various practical issues: the solution for the binary case, the scenario where this solution breaks down, and how to generalize from tasks with only two categories to many categories.
The outbreak of coronavirus disease-2019 (COVID-19) impacts public health dramatically around the world. The demographic characteristics, exposure history, dates of illness onset and dates of confirmed diagnosis were collected from the data of 24 family clusters from Beijing. The characteristics of the cases and the estimated key epidemiologic time-to-event distributions were described. The basic reproductive number (R0) was calculated. Among 89 confirmed COVID-19 patients from 24 family clusters, the median age was 38.0 years and 43.8% were male. The median of incubation period was 5.08 days (95% confidence interval (CI) 4.17–6.21). The median of serial interval was 6.00 days (95% CI 5.00–7.00). The basic reproductive number (R0) was 2.06 (95% CI 2.02–2.08). The median of onset-to-care-seeking days and the median of onset-to-hospital admission days were significantly reduced after 23 January 2020, which implied the enhanced public health awareness among families. With epidemic containment measures in place, the results can inform health authorities about possible extent of epidemic transmission within families. Furthermore, following initiation of interventions, public health measures are not only important for curbing the epidemic spread at the community level but also improve health seeking behaviour at the individual level.
This chapter is a succinct introduction to basic probabilistic methods for pattern recognition and machine learning. One focus is to clearly present the exact meanings of different terms, including the taxonomy of different probabilistic methods. We present a basic introduction to maximum likelihood and maximum a posteriori estimation, and a very brief example to showcase the concept of Bayesian estimation. For the nonparametric world, we start from the drawbacks of parametric methods, gradually analyzing the properties preferred for a nonparametric one, and finally reach the kernel density estimation, a typical nonparametric method.
This chapter is an overall introduction to the definition of pattern recognition, its relationship with machine learning and other relevant subject areas, and the main components and development process inside a pattern recognition system. This introduction is started by considering an autonomous driving example.
Considering a natural generalization of the Ruzsa–Szemerédi problem, we prove that for any fixed positive integers r, s with r < s, there are graphs on n vertices containing $n^{r}e^{-O\left(\sqrt{\log{n}}\right)}=n^{r-o(1)}$ copies of Ks such that any Kr is contained in at most one Ks. We also give bounds for the generalized rainbow Turán problem ex (n, H, rainbow - F) when F is complete. In particular, we answer a question of Gerbner, Mészáros, Methuku and Palmer, showing that there are properly edge-coloured graphs on n vertices with $n^{r-1-o(1)}$ copies of Kr such that no Kr is rainbow.
Parameter estimation is generally difficult, requiring advanced methods such as the expectation-maximization (EM). This chapter focuses on the ideas behind EM, rather than its complex mathematical properties or proofs. We use the Gaussian mixture model (GMM) as an illustrative example to find what leads us to the EM algorithms, e.g., complete and incomplete data likelihood, concave and nonconcave loss functions, and observed and hidden variables. We then derive the EM algorithm in general and its application to GMM.
This chapter presents a simple but working face recognition system, which is based on the nearest neighbor search algorithm. Albeit simple, it is a complete pattern recognition pipeline. We can then examine every component in it, and analyze potential difficulties and pitfalls one may encounter. Furthermore, we introduce a problem-solving framework, which will be useful in the rest of this book and in solving other tasks.