Bayesian Reasoning and Machine Learning

David Barber

doi:10.1017/CBO9780511804779

Chapter 15: Unsupervised linear dimension reduction

pp. 329-358

David Barber

, University College London

Get access

Add bookmark
Cite
Share

Summary

High-dimensional data is prevalent in machine learning and related areas. Indeed, there often arises the situation in which there are more data dimensions than there are data examples. In such cases we seek a lower-dimensional representation of the data. In this chapter we discuss some standard methods which can also improve the prediction performance by removing ‘noise’ from the representation.

High-dimensional spaces – low-dimensional manifolds

In machine learning problems data is often high dimensional – images, bag-of-word descriptions, gene-expressions etc. In such cases we cannot expect the training data to densely populate the space, meaning that there will be large parts in which little is known about the data. For the hand written digits from Chapter 14, the data is 784 dimensional and for binary valued pixels the number of possible images is 2784 ≈ 10236. Nevertheless, we would expect that only a handful of examples of a digit should be sufficient (for a human) to understand how to recognise a 7. Digit-like images must therefore occupy a highly constrained volume in the 784 dimensions and we expect only a small number of degrees of freedom to be required to describe the data to a reasonable accuracy. Whilst the data vectors may be very high dimensional, they will therefore typically lie close to a much lower-dimensional ‘manifold’ (informally, a two-dimensional manifold corresponds to a warped sheet of paper embedded in a high-dimensional space), meaning that the distribution of the data is heavily constrained.

About the book

Chapter DOI https://doi.org/10.1017/CBO9780511804779.020
Book DOI https://doi.org/10.1017/CBO9780511804779
Subjects Computational Statistics, Machine Learning and Information Science,Computer Science,Machine Learning and Pattern Recognition,Statistics and Probability
Format: Hardback
- Publication date: 12 March 2012
- ISBN: 9780521518147
Format: Digital
- Publication date: 05 June 2012
- ISBN: 9780511804779
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$94.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers