Bayesian Reasoning and Machine Learning

David Barber

doi:10.1017/CBO9780511804779

Chapter 8: Statistics for machine learning

pp. 165-198

David Barber

, University College London

Get access

Add bookmark
Cite
Share

Summary

In this chapter we discuss some classical distributions and their manipulations. In previous chapters we've assumed that we know the distributions and have concentrated on the inference problem. In machine learning we will typically not fully know the distributions and need to learn them from available data. This means we need familiarity with standard distributions, for which the data will later be used to set the parameters.

Representing data

The numeric encoding of data can have a significant effect on performance and an understanding of the options for representing data is therefore of considerable importance. We briefly outline three central encodings below.

Categorical

For categorical (or nominal) data, the observed value belongs to one of a number of classes, with no intrinsic ordering, and can be represented simply by an integer. An example of a categorical variable would be the description of the type of job that someone does, e.g. healthcare, education, financial services, transport, homeworker, unemployed, engineering etc. which could be represented by the values 1, 2, …, 7. Another way to transform such data into numerical values would be to use 1-of-m encoding. For example, if there are four kinds of jobs: soldier, sailor, tinker, spy, we could represent a soldier as (1,0,0,0), a sailor as (0,1,0,0), a tinker as (0,0,1,0) and a spy as (0,0,0,1). In this encoding the distance between the vectors representing two different professions is constant.

About the book

Chapter DOI https://doi.org/10.1017/CBO9780511804779.012
Book DOI https://doi.org/10.1017/CBO9780511804779
Subjects Computational Statistics, Machine Learning and Information Science,Computer Science,Machine Learning and Pattern Recognition,Statistics and Probability
Format: Hardback
- Publication date: 12 March 2012
- ISBN: 9780521518147
Format: Digital
- Publication date: 05 June 2012
- ISBN: 9780511804779
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$94.00

Hardback

US$94.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers