To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Modern programming languages rely on advanced type systems that detect errors at compile-time. While the benefits of type systems have long been recognized, there are some areas where the standard systems in programming languages are not expressive enough. Language designers usually trade expressiveness for decidability of the type system. Some interesting programs will always be rejected (despite their semantical soundness) or be assigned uninformative types.
We argue that teaching purely functional programming as such in freshman courses is detrimental to both the curriculum as well as to promoting the paradigm. Instead, we need to focus on the more general aims of teaching elementary techniques of programming and essential concepts of computing. We support this viewpoint with experience gained during several semesters of teaching large first-year classes (up to 600 students) in Haskell. These classes consisted of computer science students as well as students from other disciplines. We have systematically gathered student feedback by conducting surveys after each semester. This article contributes an approach to the use of modern functional languages in first year courses and, based on this, advocates the use of functional languages in this setting.
Example isn't another way to teach, it is the only way to teach
Albert Einstein
Machine vision has found a wide set of applications from astronomy [17.44] to industrial inspection, to automatic target recognition. Itwould be impossible to cover them all in the detail which they deserve. In this chapter, we choose to provide the reader with more of an annotated bibliography than a pedagogical text. We mention a few applications very briefly, and provide a few references. In the next chapter, we choose one application discipline, automatic target recognition, to cover in a bit more detail.
Multispectral image analysis
The strategy of multispectral image analysis combines spatial and spectral representations in a representation in which each pixel is a vector, an ordered set of measurements. Color, where the elements of the vector are [r, g, b], is the obvious example, and there is a great deal of work in the literature in color processing. Most of the reported work has been intended for image quality enhancement. Only some recent papers elaborate on the use of color for recognition [17.14, 17.18, 17.53, 17.58].
The methods we have studied for univariate images, for example using Markov random field methods to remove noise, are applicable to multispectral images [17.3]. Often, all that is necessary is to use a vector description instead of scalar pixels.
Optical character recognition (OCR)
Despite our love for this topic and the huge number of papers devoted (e.g., in this paragraph, we cite only a few references [16.1, 17.32, 17.64]), we cannot take the space to cover it in the kind of detail it deserves.
Computers are useless. They can only give us answers
Pablo Picasso
In this chapter, we describe how images are formed and how they are represented. Representations include both mathematical representations for the information contained in an image and for the ways in which images are stored and manipulated in a digital machine. In this chapter, we also introduce a way of thinking about images – as surfaces with varying height – which we will find to be a powerful way to describe both the properties of images as well as operations on those images.
Image representations
In this section, we discuss several ways to represent the information in an image. These representations include: iconic, functional, linear, probabilistic, spatial frequency, and relational representations.
Iconic representations (an image)
An iconic representation of the information in an image is an image. “Yeah, right; and a rose is a rose is a rose.” When you see what we mean by functional, linear, and relational representations, you will realize we need a word for a representation which is itself a picture. Some examples of iconic representations include the following.
2D brightness images, also called luminance images. The things you are used to calling “images.” These might be color or gray-scale. (Be careful with the words “black and white,” as that might be interpreted as “binary”). We usually denote the brightness at a point 〈x, y〉 as f(x, y). Note: x and y could be integers (in this case, we are referring to discrete points in a sampled image; these points are called “pixels,” short for “picture elements”), or real numbers (in this case, we are thinking of the image as a function).
Luke, you've switched off your targeting computer. What's wrong?
George Lucas
This is the principal application chapter of this book. We have selected one application area: Automatic target recognition (ATR), and illustrate how the mathematics and algorithms previously covered are used in this application. The point to be made is that almost all applications similarly benefit from not one, but fusions of most of the techniques previously described. As in previous chapters, we provide the reader with both an explanation of concepts and pointers to more advanced literature. However, since this chapter emphasizes the application, we do not include a “Topics” section in this chapter.
Automatic target/object recognition (ATR) is the term given to the field of engineering sciences that deals with the study of systems and techniques designed to identify, to locate, and to characterize specific physical objects (referred to as targets) [18.7, 18.9, 18.69], usually in a military environment. Limited surveys of the field are available [18.3, 18.8, 18.21, 18.66, 18.74, 18.79, 18.89]. In this chapter, the only ATR systems considered are those that make use of images. Therefore, our use of terminology (e.g., clutter) will be restricted to terms that make sense in an imaging scenario.
The hierarchy of levels of ATR
In this section, we define a few popularly used terms and acronyms in the ATR [18.57] world, starting with the five levels in the ATR hierarchy.
Detection. Identifying the presence or absence of a target in a given scene.
Classification. This term, at least in Army parlance, originally meant distinguishing between vehicles with tracks and those with wheels.
Statistics are used much like a drunk uses a lamppost: for support, not illumination
Vin Scully
The discipline of statistical pattern recognition by itself can fill textbooks (and in fact, it does). For that reason, no effort is made to cover the topic in detail in this single chapter. However, the student in machine vision needs to know at least something about statistical pattern recognition in order to read the literature and to properly put the other machine vision topics in context. For that reason, a brief overview of the field of statistical methods is included here. To do serious research in machine vision, however, this chapter is not sufficient, and the student must take a full course in statistical pattern recognition. For texts, we recommend several: The original version of the text by Duda and Hart [14.3] included both statistical pattern classification and machine vision, however, the new version [14.4] is pretty much limited to classification, and we recommend it for completeness. The much older text by Fukanaga [14.6] still retains a lot of useful information, and we recommend [14.11] for readability.
Design of a classifier
Recall the example described in section 13.2. In that example, we are given models for axes and hatchets which were derived statistically by computing averages of samples known to be either axes or hatchets. We called these collections “training sets.”
In this chapter, we discuss a completely different approach to pattern recognition, a methodology based on an analogy to language understanding. These methods have not often been used recently, primarily because they are very sensitive to noise and distortion. However, for certain applications, they may be appropriate, and the student is advised to learn enough about this topic to recognize potential applications.
Consider a boundary segment represented by a chain code. Each step in that chain code is a symbol, an integer between 0 and 7, so that the boundary segment is represented by a string of symbols: What makes syntactic methods work is the analogy between this string of symbols and the string of symbols which show up in the description of a formal language.
Terminology
To make more progress in this area, we need to define some terminology. The definitions are in reference to analysis of strings of symbols, such as occur in language analysis.
A terminal symbol: A word, like “horse,” “aardvark,” “professor,” “runs,” “grades.” Terminal symbols may also be line segments, parts of a picture, or other features. Generally, we denote terminal symbols using lower case. Most often, terminal symbols are denoted by a single symbol, like “a” or “0” but in the example of words from English, the terminal symbols are words, not letters.
Functions are born of functions, and in turn, give birth or death to others. Forms emerge from forms and others arise or descend from these
L. Sullivan
You have already seen the use of graph-theoretic terminology in connected component labeling in Chapter 8. The way we used the term “connected components” in the past was to consider each pixel as a vertex in a graph, and think of each vertex as having four, six, or eight edges to other vertices (that is, four-connected neighbors, six neighbors if hexagonal pixel is used, and eight-connected neighbors). However, we did not build elaborate set-theoretic or other data structures there. We will do so in this chapter. The graph-matching techniques discussed in this chapter will be used a great deal in Chapter 13.
Graphs
A graph is a relational data structure. It consists of data elements, referred to as vertices or nodes, and relationships between vertices, referred to as edges.
Graphs may be completely described by sets. The set of vertices is a simple set, and the edges form a set of ordered pairs. For example, let G = 〈V, E〉 represent a graph, where the vertex set V = {a, b, c, d, e, f} and the edge set E = {(a, b), (b, c), (a, c), (b, e), (d, f)}. Graphs may also be represented pictorially.
Segmentation is the process of separating objects from background. It is the building block for all the subsequent processes like shape analysis, object recognition, etc. In this chapter, we first discuss several popular segmentation algorithms, including threshold-based, region-based (or connected component analysis), edge-based, and surface-based. We also describe some recently developed segmentation algorithms in the topics section.
Segmentation: Partitioning an image
In many machine vision applications, the set of possible objects in the scene is quite limited. For example, if the camera is viewing a conveyer, there may be only one type of part which appears, and the vision task could be to determine the position and orientation of the part. In other applications, the part being viewed may be one of a small set of possible parts, and the objective is to both locate and identify each part.?? Finally, the camera may be used to inspect parts for quality control.
In this section, we will assume that the parts are fairly simple and can be characterized by their two-dimensional projections, as provided by a single camera view. Furthermore, we will assume that the shape is adequate to characterize the objects. That is, color or variation in brightness is not required. We will first consider dividing the picture into connected regions.
A segmentation of a picture is a partitioning into connected regions, where each region is homogeneous in some sense and is identified by a unique label.
A man's discourse is like to a rich Persian carpet, the beautiful figures and patterns of which can be shown only by spreading and extending it out; when it is contracted and folded up, they are obscured and lost
Plutarch
The suffix “-ology” means “study of-,” so obviously, “morphology” is the study of morphs; answering critical questions like: “How come they only come out at night, and then fly toward the light?” and “Why is it that bug zappers only toast the harmless critters, leaving the 'skeeters alone?” and – HOLD IT! That's MORPH-ology, the study of SHAPE, not moths! Try again …
Binary morphology
We begin by considering ONLY BINARY images. That's important, remember it! Only binary! We will discuss a couple of operators first. Then, once you understand how they work, we'll explain how they are used in binary images. As an extension to binary morphology, we also describe gray scale morphology operations and the corresponding operators.
Dilation
First, the intuitive definition: The dilation of a (BINARY) image is that same image with all the foreground regions made just a little bit bigger.
Now, formally: We consider two images, fA and fB, and let A and B be sets of ordered pairs, consisting of the coordinates of each foreground pixel in fA and fB, respectively.
Consider one pixel in fB, and its corresponding element (ordered pair) of B, call that element b ∈ B.
To change, and to change for the better are two different things
German proverb
In this chapter, we move toward developing techniques which remove noise and degradations so that features can be derived more cleanly for segmentation. The techniques of a posteriori image restoration and iterative image feature extraction are described and compared. While image restoration methods remove degradations from an image [6.3], image feature extraction methods extract features such as edges from noisy images. Both are shown to perform the same basic operation: image relaxation. In the advanced topics section, image feature extraction methods, known as graduated nonconvexity (GNC) and variable conductance diffusion (VCD), are compared with a restoration/feature extraction method known as mean field annealing (MFA). This equivalence shows the relationship between energy minimization methods and spatial analysis methods and between their respective parameters of temperature and scale. The chapter concludes by discussing the general philosophy of extracting features from images.
Relaxation
The term “relaxation” was originally used to describe a collection of iterative numerical techniques for solving simultaneous nonlinear equations (see [6.18] for a review). The term was extended to a set of iterative classification methods by Rosenfeld and Kak [6.64] because of their similarity. Here, we provide a general definition of the term which will encompass these methods as well as those more recent techniques which are the emphasis of this discussion.
In this chapter, we approach the problem alluded to in Chapter 14 where the training set simply contains points, and those points are not marked in any way to indicate from which class they may have come. As in the previous chapter, we present only a brief overview of the field, and refer the reader to other texts [14.4, 15.7] for more thorough coverage. One very important area which we omit here is the use of biologically inspired models for clustering [15.4, 15.5, 15.6], and the reader is strongly encouraged to look into these.
We will discuss the issues of clustering in a rather general sense, but note one particular application, which is identification of peaks in the Hough transform array.
Consider this example from satellite pattern classification: We imagine a downward-looking satellite orbiting the earth, which, at each observed point, makes a number of measurements of the light emitted/reflected from that point on the earth's surface. Typically, as many as seven different measurements might be taken from a given point, each measurement in a different spectral band. Each “pixel” in the resulting image would then be a 7-vector where the elements of this vector might represent the intensity in the far-infrared, the near-infrared, blue, green, etc.
Computer Science is not about computers any more than astronomy is about telescopes
E. W. Dijkstra
One may take two approaches to writing software for image analysis, depending on what one is required to optimize. One may write in a style which optimizes/minimizes programmer time, or one may write to minimize computer time. In this course, computer time will not be a concern (at least not usually), but your time will be far more valuable. For that reason, we want to follow a programming philosophy which produces correct, operational code in a minimal amount of programmer time.
The programming assignments in this book are specified to be written in C or C++, rather than in MATLAB or JAVA. This is a conscious and deliberate decision. MATLAB in particular hides many of the details of data structures and data manipulation from the user. In the course of teaching variations of this course for many years, the authors have found that many of those details are precisely the details that students need to grasp in order to effectively understand what image processing (particularly at the pixel level) is all about.
Image File System (IFS) software
The objective of quickly writing good software is accomplished by using the image access subroutines in IFS. IFS is a collection of subroutines and applications based on those subroutines which support the development of image processing software. Advantages of IFS include the following.
This textbook covers both fundamentals and advanced topics in computer-based recognition of objects in scenes. It is intended to be both a text and a reference. Almost every chapter has a “Fundamentals” section which is pedagogically structured as a textbook, and a “Topics” section which includes extensive references to the current literature and can be used as a reference. The text is directed toward graduate students and advanced undergraduates in electrical and computer engineering, computer science, or mathematics.
Chapters 4 through 17 cover topics including edge detection, shape characterization, diffusion, adaptive contours, parametric transforms, matching, and consistent labeling. Syntactic and statistical pattern recognition and clustering are introduced. Two recurrent themes are used throughout these chapters: Consistency (a principal philosophical construct for solving machine vision problems) and optimization (the mathematical tool used to implement those methods). These two topics are so pervasive that we conclude each chapter by discussing how they have been reflected in the text. Chapter 18 uses one application area, automatic target recognition, to show how all the topics presented in the previous chapters can be integrated to solve real-world problems.
This text assumes a solid graduate or advanced-undergraduate background including linear algebra and advanced calculus. The student who successfully completes this course can design a wide variety of industrial, medical, and military machine vision systems. Software and data used in the book can be found at www.cambridge.org/9780521830461.