At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world. For example, we might observe adjacent frames of a video sequence and infer the camera motion, or we might observe a facial image and infer the identity.
The aim of this chapter is to describe a mathematical framework for solving this type of problem and to organize the resulting models into useful subgroups, which will be explored in subsequent chapters.
Computer vision problems
In vision problems, we take visual data x and use them to infer the state of the world w. The world state w may be continuous (the 3D pose of a body model) or discrete (the presence or absence of a particular object). When the state is continuous, we call this inference process regression. When the state is discrete, we call it classification.
Unfortunately, the measurements x may be compatible with more than one world state w. The measurement process is noisy, and there is inherent ambiguity in visual data: a lump of coal viewed under bright light may produce the same luminance measurements as white paper in dim light. Similarly, a small object seen close-up may produce the same image as a larger object that is further away.
In the face of such ambiguity, the best that we can do is to return the posterior probability distribution Pr(w|x) over possible states w.
Review the options below to login to check your access.
Log in with your Cambridge Aspire website account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.