Search results for Image processing and machine vision

5 - Dense Correspondence and Its Applications
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 148-206
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the last chapter we focused on detecting and matching distinctive features. Typically, features are sparsely distributed – that is, not every pixel location has a feature centered at it. However, for several visual effects applications, we require a dense correspondence between pixels in two images, even in relatively flat or featureless areas. One of the most common applications of dense correspondence in filmmaking is for slowing down or speeding up a shot after it's been filmed for dramatic effect. To create the appropriate intermediate frames, we need to estimate the trajectory of every pixel in the video sequence over the course of a shot, not just a few pixels near features.
More mathematically, we want to compute a vector field (u(x,y),v(x,y)) over the pixels of the first image I1, so that the vector at each pixel (x,y) points to a corresponding location in the second image I2. That is, the pixels I1(x,y) and I2(x +u(x,y),y + v(x,y)) correspond. We usually abbreviate the vector field as (u,v) with the understanding that both elements are functions of x and y.
Defining what constitutes a correspondence in this context can be tricky. As in feature matching, our intuition is that a correspondence implies that both pixels arise from the same point on the surface of some object in the physical world. The vector (u,v) is induced by the motion of the camera and/or the object in the interval between taking the two pictures.

B - Figure Acknowledgments
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 364-366
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp vii-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Image Compositing and Editing
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 55-106
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we discuss image compositing and editing, the manipulation of a single image or the combination of elements from multiple sources to make a convincing final image. Like image matting, image compositing and editing are pervasive in modern TV and filmmaking. Virtually every frame of a blockbuster movie is a combination of multiple elements. We can think of compositing as the inverse of matting: putting images together instead of pulling them apart. Consequently, the problems we consider are generally easier to solve and require less human intervention.
In the simplest case, we may just want to place a foreground object extracted by matting onto a different background image. As we saw in Chapter 2, obtaining high-quality mattes is possible using a variety of algorithms, and new images made using the compositing equation (2.3) generally look very good. On the other hand, a fair amount of user interaction is often required to obtain these mattes – for example, heuristically combining different color channels, painting an intricate trimap, or scribbling and rescribbling to refine a matte. The algorithms in the first half of this chapter take a different approach: the user roughly outlines an object in a source image to be removed and recomposited into a target image, and the algorithm automatically estimates a good blend between the object and its new background without explicitly requiring a matte. These “drag-and-drop”-style algorithms could potentially save a lot of manual effort.

7 - Motion Capture
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 255-299
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Motion capture (often abbreviated as mocap) is probably the application of computer vision to visual effects most familiar to the average filmgoer. As illustrated in Figure 7.1, motion capture uses several synchronized cameras to track the motion of special markers carefully placed on the body of a performer. The images of each marker are triangulated and processed to obtain a time series of 3D positions. These positions are used to infer the time-varying positions and angles of the joints of an underlying skeleton, which can ultimately help animate a digital character that has the same mannerisms as the performer. While the Gollum character from the Lord of the Rings trilogy launched motion capture into the public consciousness, the technology already had many years of use in the visual effects industry (e.g., to animate synthetic passengers in wide shots for Titanic). Today, motion capture is almost taken for granted as a tool to help map an actor's performance onto a digital character, and has achieved great success in recent films like Avatar.
In addition to creating computer-generated characters for feature films, motion capture is pervasive in the video game industry, especially for sports and action games. The distinctive mannerisms of golf and football players, martial artists, and soldiers are recorded by video game developers and strung together in real time by game engines to create dynamic, reactive character animations. In non-entertainment contexts, motion capture is used in orthopedics applications to analyze a patient's joint motion over the course of treatment, and in sports medicine applications to improve an athlete's performance.

4 - Features and Matching
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 107-147
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In many visual effects applications, we need to relate images taken from different perspectives or at different times. For example, we often want to track a point on a set as a camera moves around during a shot so that a digital creature can be later inserted at that location. In fact, finding and tracking many such points is critical for algorithms that automatically estimate the 3D path of a camera as it moves around a scene, a problem called matchmoving that is the subject of Chapter 6. However, not every point in the scene is a good choice for tracking, since many points look alike. In this chapter, we describe the process of automatically detecting regions of an image that can be reliably located in other images of the same scene; we call these special regions features. Once the features in a given image have been found, we also discuss the problems of describing, matching, and tracking them in different images of the same scene.
In addition to their core use for matchmoving, feature detection is also important for certain algorithms that estimate dense correspondence between images and video sequences (Chapter 5), as well as for both marker-based and markerless motion capture (Chapter 7). Outside the domain of visual effects, feature matching and tracking is commonly used for stitching images together to create panoramas [72], localizing mobile robots [432], and quickly finding objects [456] or places [424] in video databases.

Frontmatter
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp i-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A - Optimization Algorithms for Computer Vision
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 353-363
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 393-398
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Matchmoving
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 207-254
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Matchmoving, also known as camera tracking, is a major aspect of modern visual effects. It's the key underlying process that allows visual effects artists to convincingly insert computer-generated elements and characters into a live-action plate, so that everything appears to “live in” a consistent three-dimensional world. In every modern action movie (and even many non-action movies), the first step after acquiring live footage is to track the camera to enable the addition of spatially accurate visual effects.
The basic problem is to determine, using a given video sequence as input, the three-dimensional location and orientation of the camera at every frame with respect to landmarks in the scene. Depending on the situation, we may have some prior information – such as estimates of the focal length from the camera's lens barrel or labeled landmarks with known 3D coordinates-or the video may come from an entirely unknown camera and environment.
Matchmoving is fundamentally the same as a computer vision problem called structure from motion. In fact, several of the main matchmoving software packages for visual effects grew directly out of academic research discussed in this chapter. In turn, structure from motion is closely related to photogrammetry, mathematical techniques used by surveyors to estimate the shape of buildings and terrain from multiple images. Many structure from motion techniques “discovered” by computer vision researchers in the 1990s share key steps with photogrammetric techniques developed by cartographers and geodesists in the 1950s or earlier. Finally, structure from motion is closely related to the problem of simultaneous location and mapping or SLAM from robotics, in which a mobile robot must self-localize by taking measurements of its environment.

Bibliography
Richard J. Radke, Rensselaer Polytechnic Institute, New York
Book:

Computer Vision for Visual Effects

Published online:

05 December 2012

Print publication:

19 November 2012, pp 367-392
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Computer Vision

Models, Learning, and Inference
Simon J. D. Prince
Published online:

05 August 2012

Print publication:

18 June 2012
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This modern treatment of computer vision focuses on learning and inference in probabilistic models as a unifying theme. It shows how to use training data to learn the relationships between the observed image data and the aspects of the world that we wish to estimate, such as the 3D structure or the object class, and how to exploit these relationships to make new inferences about the world from new image data. With minimal prerequisites, the book starts from the basics of probability and model fitting and works up to real examples that the reader can implement and modify to build useful vision systems. Primarily meant for advanced undergraduate and graduate students, the detailed methodological presentation will also be useful for practitioners of computer vision.Covers cutting-edge techniques, including graph cuts, machine learning and multiple view geometryA unified approach shows the common basis for solutions of important computer vision problems, such as camera calibration, face recognition and object trackingMore than 70 algorithms are described in sufficient detail to implementMore than 350 full-color illustrations amplify the textThe treatment is self-contained, including all of the background mathematicsAdditional resources at www.computervisionmodels.com

1 - Introduction
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 1-6
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The goal of computer vision is to extract useful information from images. This has proved a surprisingly challenging task; it has occupied thousands of intelligent and creative minds over the last four decades, and despite this we are still far from being able to build a general-purpose “seeing machine.”
Part of the problem is the complexity of visual data. Consider the image in Figure 1.1. There are hundreds of objects in the scene. Almost none of these are presented in a “typical” pose. Almost all of them are partially occluded. For a computer vision algorithm, it is not even easy to establish where one object ends and another begins. For example, there is almost no change in the image intensity at the boundary between the sky and the white building in the background. However, there is a pronounced change in intensity on the back window of the SUV in the foreground, although there is no object boundary or change in material here.
We might have grown despondent about our chances of developing useful computer vision algorithms if it were not for one thing: we have concrete proof that vision is possible because our own visual systems make light work of complex images such as Figure 1.1. If I ask you to count the trees in this image or to draw a sketch of the street layout, you can do this easily.

7 - Modeling complex data densities
from II - Machine learning for machine vision
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 71-107
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the last chapter we showed that classification with generative models is based on building simple probability models. In particular, we build class-conditional density functions Pr(x|w = k) over the observed data x for each value of the world state w.
In Chapter 3 we introduced several probability distributions that could be used for this purpose, but these were quite limited in scope. For example, it is not realistic to assume that all of the complexities of visual data are well described by the normal distribution. In this chapter, we show how to construct complex probability density functions from elementary ones using the idea of a hidden variable.
As a representative problem we consider face detection; we observe a 60 × 60 RGB image patch, and we would like to decide whether it contains a face or not. To this end, we concatenate the RGB values to form the 10800 × 1 vector x. Our goal is to take the vector x and return a label w ϵ {0,1} indicating whether it contains background (w =0) or a face (w = 1). In a real face detection system, we would repeat this procedure for every possible subwindow of an image (Figure 7.1).
We will start with a basic generative approach in which we describe the likelihood of the data in the presence/absence of a face with a normal distribution. We will then extend this model to address its weaknesses.

13 - Image preprocessing and feature extraction
from IV - Preprocessing
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 269-294
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter provides a brief overview of modern preprocessing methods for computer vision. In Section 13.1 we introduce methods in which we replace each pixel in the image with a new value. Section 13.2 considers the problem of finding and characterizing edges, corners and interest points in images. In Section 13.3 we discuss visual descriptors; these are low-dimensional vectors that attempt to characterize the interesting aspects of an image region in a compact way. Finally, in Section 13.4 we discuss methods for dimensionality reduction.
Per-pixel transformations
We start our discussion of preprocessing with per-pixel operations: these methods return a single value corresponding to each pixel of the input image. We denote the original 2D array of pixel data as P, where pij is the element at the ith of I rows and the jth of J columns. The element pij is a scalar representing the grayscale intensity. Per-pixel operations return a new 2D array X of the same size as P containing elements xij.
Whitening
The goal of whitening (Figure 13.1) is to provide invariance to fluctuations in the mean intensity level and contrast of the image. Such variation may arise because of a change in ambient lighting intensity, the object reflectance, or the camera gain. To compensate for these factors, the image is transformed so that the resulting pixel values have zero mean and unit variance.

12 - Models for grids
from III - Connecting local models
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 227-266
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 11, we discussed models that were structured as chains or trees. In this chapter, we consider models that associate a label with each pixel of an image. Since the unknown quantities are defined on the pixel lattice, models defined on a grid structure are appropriate. In particular, we will consider graphical models in which each label has a direct probabilistic connection to each of its four neighbors. Critically, this means that there are loops in the underlying graphical model and so the dynamic programming and belief propagation approaches of the previous chapter are no longer applicable.
These grid models are predicated on the idea that the pixel provides only very ambiguous information about the associated label. However, certain spatial configurations of labels are known to be more common than others, and we aim to exploit this knowledge to resolve the ambiguity. In this chapter, we describe the relative preference for different configurations of labels with a pairwise Markov random field or MRF. As we shall see, maximum a posteriori inference for pairwise MRFs is tractable in some circumstances using a family of approaches known collectively as graph cuts.
To motivate the grid models, we introduce a representative application. In image denoising we observe a corrupted image in which the intensities at a certain proportion of pixels have been randomly changed to another value according to a uniform distribution (Figure 12.1).

16 - Multiple cameras
from V - Models for geometry
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 354-384
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Introduction to probability
from I - Probability
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 9-16
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Models for chains and trees
from III - Connecting local models
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 195-226
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

15 - Models for transformations
from V - Models for geometry
Simon J. D. Prince, University College London
Book:

Computer Vision

Published online:

05 August 2012

Print publication:

18 June 2012, pp 323-353
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we consider a pinhole camera viewing a plane in the world. In these circumstances, the camera equations simplify to reflect the fact that there is a one-to-one mapping between points on this plane and points in the image.
Mappings between the plane and the image can be described using a family of 2D geometric transformations. In this chapter, we characterize these transformations and show how to estimate their parameters from data. We revisit the three geometric problems from Chapter 14 for the special case of a planar scene.
To motivate the ideas of this chapter, consider an augmented reality application in which we wish to superimpose 3D content onto a planar marker (Figure 15.1). To do this, we must establish the rotation and translation of the plane relative to the camera. We will do this in two stages. First, we will estimate the 2D transformation between points on the marker and points in the image. Second, we will extract the rotation and translation from the transformation parameters.
Two-dimensional transformation models
In this section, we consider a family of 2D transformations, starting with the simplest and working toward the most general. We will motivate each by considering viewing a planar scene under different viewing conditions.
Euclidean transformation model
Consider a calibrated camera viewing a fronto-parallel plane at known distance, D (i.e., a plane whose normal corresponds to the ω-axis of the camera).

Image processing and machine vision

Refine search

Refine search

Actions for selected content:

526 results in Image processing and machine vision

5 - Dense Correspondence and Its Applications

Summary

B - Figure Acknowledgments

Contents

3 - Image Compositing and Editing

Summary

7 - Motion Capture

Summary

4 - Features and Matching

Summary

Frontmatter

A - Optimization Algorithms for Computer Vision

Index

6 - Matchmoving

Summary

Bibliography

Computer Vision

1 - Introduction

Summary

7 - Modeling complex data densities

Summary

13 - Image preprocessing and feature extraction

Summary

12 - Models for grids

Summary

16 - Multiple cameras

2 - Introduction to probability

11 - Models for chains and trees

15 - Models for transformations

Summary

Image processing and machine vision

Refine search

Refine search

Actions for selected content:

Save Search

526 results in Image processing and machine vision

Summary

Summary

Summary

Summary

Summary

Computer Vision

Summary

Summary

Summary

Summary

Summary