We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Desire to improve drives many human activities. Optimization can be seen as a means for identifying better solutions by utilizing a scientific and mathematical approach. In addition to its widespread applications, optimization is an amazing subject with very strong connections to many other subjects and deep interactions with many aspects of computation and theory. The main goal of this textbook is to provide an attractive, modern, and accessible route to learning the fundamental ideas in optimization for a large group of students with varying backgrounds and abilities. The only background required for the textbook is a first-year linear algebra course (some readers may even be ready immediately after finishing high school). However, a course based on this book can serve as a header course for all optimization courses. As a result, an important goal is to ensure that the students who successfully complete the course are able to proceed to more advanced optimization courses.
Another goal of ours was to create a textbook that could be used by a large group of instructors, possibly under many different circumstances. To a degree, we tested this over a four-year period. Including the three of us, 12 instructors used the drafts of the book for two different courses. Students in various programs (majors), including accounting, business, software engineering, statistics, actuarial science, operations research, applied mathematics, pure mathematics, computational mathematics, computer science, combinatorics and optimization, have taken these courses. We believe that the book will be suitable for a wide range of students (mathematics, mathematical sciences including computer science, engineering including software engineering, and economics).
In Chapter 2, we have seen how to solve LPs using the simplex algorithm, an algorithm that is still widely used in practice. In Chapter 3, we discussed efficient algorithms to solve the special class of IPs describing the shortest path problem and the minimum cost perfect matching problem in bipartite graphs. In both these examples, it is sufficient to solve the LP relaxation of the problem.
Integer programming is widely believed to be a difficult problem (see Appendix A). Nonetheless, we will present algorithms that are guaranteed to solve IPs in finite time. The drawback of these algorithms is that the running time may be exponential in the worst case. However, they can be quite fast for many instances, and are capable of solving many large-scale, real-life problems.
These algorithms follow two general strategies. The first attempts to reduce IPs to LPs – this is known as the cutting plane approach and will be described in Section 6.2. The other strategy is a divide and conquer approach and is known as branch and bound and will be discussed in Section 6.3. In practice, both strategies are combined under the heading of branch and cut. This remains the preferred approach for all general purpose commercial codes.
In this chapter, in the interest of simplicity we will restrict our attention to pure IPs where all the variables are required to be integer. The theory developed here extends to mixed IPs where only some of the variables are required to be integer, but the material is beyond the scope of this book.
Consider an LP (P) with variables x1, …, xn. Recall that an assignment of values to each of x1, …, xn is a feasible solution if the constraints of (P) are satisfied. We can view a feasible solution to (P) as a vector x = (x1, …, xn)T. Given a vector x, by the value of x we mean the value of the objective function of (P) for x. Suppose (P) is a maximization problem. Then recall that we call a vector x an optimal solution if it is a feasible solution and no feasible solution has larger value. The value of the optimal solution is the optimal value. By definition, an LP has only one optimal value; however, it may have many optimal solutions. When solving an LP, we will be satisfied with finding any optimal solution. Suppose (P) is a minimization problem. Then a vector x is an optimal solution if it is a feasible solution and no feasible solution has smaller value.
If an LP (P) has a feasible solution, then it is said to be feasible, otherwise it is infeasible. Suppose (P) is a maximization problem and for every real number α there is a feasible solution to (P) which has value greater than α, then we say that (P) is unbounded. In other words, (P) is unbounded if we can find feasible solutions of arbitrarily high value. Suppose (P) is a minimization problem and for every real number α there is a feasible solution to (P) which has value smaller than α, then we say that (P) is unbounded.
In this chapter, we revisit the shortest path and minimum-cost matching problems. Both were first introduced in Chapter 1, where we discussed practical example applications. We further showed that these problems can be expressed as IPs. The focus in this chapter will be on solving instances of the shortest path and matching problems. Our starting point will be to use the IP formulation we introduced in Section 1.5. We will show that studying the two problems through the lens of linear programming duality will allow us to design efficient algorithms. We develop this theory further in Chapter 4.
The shortest path problem
Recall the shortest path problem from Section 1.4.1. We are given a graph G = (V, E), nonnegative lengths ce for all edges e ∈ E, and two distinct vertices s, t ∈ V. The length c(P) of a path P is the sum of the length of its edges, i.e. Σ(ce: e ∈ P). We wish to find among all possible st-paths one that is of minimum length.
Example 7 In the following figure, we show an instance of this problem. Each of the edges in the graph is labeled by its length. The thick black edges in the graph form an st-path P = sa, ac, cb, bt of total length 3 + 1 + 2 + 1 = 7. This st-path is of minimum length, hence is a solution to our problem.
An algorithm is a formal procedure that describes how to solve a problem. For instance, the simplex algorithm in Chapter 2 takes as input a linear program in standard equality form and either returns an optimal solution, or detects that the linear program is infeasible or unbounded. Another example is the shortest path algorithm in Chapter 3.1. It takes as input a graph with distinct vertices s, t and nonnegative integer edge lengths, and returns an st-path of shortest length (if one exists).
The two basic properties we require for an algorithm are: correctness and termination. By correctness, we mean that the algorithm is always accurate when it claims that we have a particular outcome. One way to ensure this is to require that the algorithm provides a certificate, i.e. a proof, to justify its answers. By termination, we mean that the algorithm will stop after a finite number of steps.
In Section A.1, we will define the running time of an algorithm; we will formalize the notions of slow and fast algorithms. Section A.2 reviews the algorithms presented in this book and discusses which ones are fast and which ones are slow. In Sections A.3 and A.4 we discuss the inherent complexity of various classes of optimization problems and discuss the possible existence of classes of problems for which it is unlikely that any fast algorithm exists. We explain how an understanding of computational complexity can guide us in the design of algorithms.
Broadly speaking, optimization is the problem of minimizing or maximizing a function subject to a number of constraints. Optimization problems are ubiquitous. Every chief executive officer (CEO) is faced with the problem of maximizing profit given limited resources. In general, this is too general a problem to be solved exactly; however, many aspects of decision making can be successfully tackled using optimization techniques. This includes, for instance, production, inventory, and machine-scheduling problems. Indeed, the overwhelming majority of Fortune 500 companies make use of optimization techniques. However, optimization problems are not limited to the corporate world. Every time you use your GPS, it solves an optimization problem, namely how to minimize the travel time between two different locations. Your hometown may wish to minimize the number of trucks it requires to pick up garbage by finding the most efficient route for each truck. City planners may need to decide where to build new fire stations in order to efficiently serve their citizens. Other examples include: how to construct a portfolio that maximizes its expected return while limiting volatility; how to build a resilient tele-communication network as cheaply as possible; how to schedule flights in a cost-effective way while meeting the demand for passengers; or how to schedule final exams using as few classrooms as possible.
Suppose that you are a consultant hired by the CEO of the WaterTech company to solve an optimization problem.
The lossy compression techniques presented so far have tried to exploit the fundamental mathematical properties of information (lossless coding), to model and approximate the properties of the signal directly (differential coding), and to model the creation of the signal (source coding, such as in speech compression). We also presented simple perceptual methods, such as the µ-law encoder.
The methods presented in this chapter use transformations modeled after how human sensory perception works, using a much greater sophistication level. These perceptual coders are so effective that they are used in virtually every device today that handles images or sound, from photo cameras to mobile phones to DVD players to mobile digital music players.
Before we introduce them, we recapitulate two fundamental signal transformations that are an important prerequisite for all the algorithms presented in this chapter, as well as for many of the analysis algorithms presented later. When explaining perceptual compression, two transformations are very important: the Discrete Fourier Transform (DFT) and the Discrete Cosine Transform (DCT), which are described in the following sections. Other transforms, such as the Discrete Wavelet Transforms (DWT), which are a generalization on the transforms mentioned, are also used in multimedia signal processing, and the references cited at the end of this chapter are well worth looking up.
The project to write a textbook on multimedia computing started a few years ago, when the coauthors independently realized that a book that addresses the basic concepts related to the increasing volume of multimedia data in different aspects of communications in the computing age is needed.
Digital computing started with processing numbers, but very soon after its start it began dealing with symbols of other kinds and developed computational approaches for dealing with alpha-numeric data. Efforts to use computers to deal with audio, visual, and other perceptual information did not become successful enough to be used for any applications until about the 1980s. Only slowly, computer graphics, audio processing, and visual analysis started becoming feasible. First was the ability to store large volumes of audiovisual data, then displaying or rendering it, then distributing it, and later processing and analyzing it. For that reason, it took until about the 1990s for the term multimedia to grow popular in computing.
While different fields have emerged around acoustic, visual, and natural text content that specializes in these data types, multimedia computing deals with documents holistically, taking into account all media available. Dominated by the availability of electronic sensors, multimedia communication is currently focused on visual and audio, followed by metadata (such as GPS) and touch. Multimedia computing deals with multiple media holistically because the purpose of documents that contain multiple media is to communicate information. Information about almost all real-world events and objects must be captured using multiple sensors as each sensor only captures one aspect of the information of interest. The challenge for multimedia computing systems is to integrate the different information streams into a coherent view. Humans do this every moment of their lives from birth and are therefore often used as a baseline when building multimedia systems. Therefore it’s not surprising that, slowly but surely, all computing is becoming multimedia. The use of multimedia data in computing has grown even more rapidly than imagined just a few years ago with the installation of cameras in cell phones in combination with the ability to share multimedia documents in social networks easily.
In the previous chapters, we described many signal processing and content analysis techniques. However, the content of an image or audio file does not alone determine its meaning and impression on the user. In this chapter, we will therefore describe other factors that are very important to consider in multimedia computing: the set of surrounding circumstances in which the content is presented, otherwise known as context. Context is often neglected in academic work because it can be leveraged in many ways in multimedia systems and is often so effective that the content analysis approach becomes secondary. So let us first find out what context really is.
Almost two centuries ago, George Berkeley asked: If a tree falls in a forest and no one is around to hear it, does it make a sound? Sound is often defined as the sensation excited in the ear when the air or other medium is set in motion. Thus, if there is no receiving ear, then there is no sound. In other words, perception is not only data – it is a close interaction between the data, transmission medium, and the interpreter. This is shown in Figure 19.1.
The data acquired for an environment
The medium used to transmit physical attributes to the perceiver
The perceiver
Characteristics of each of these must be considered in designing and developing a multimedia system. It has been very well realized, and rigorously articulated and represented, that we understand the world based on the sensory data that we receive using our sensors and the knowledge about the world that we have accumulated since our birth. Both the data and the knowledge are integral components of understanding.
Only a few inventions in the history of civilization have had the same impact on society in so many ways and at so many levels as computers. Where once we used computers for computing with simple alphanumeric data, we now use them primarily to exchange information, to communicate, and to share experiences. Computers are rapidly evolving as a means for gaining insights and sharing ideas across distance and time.
Multimedia computing started gaining serious attention from researchers and practitioners during the 1990s. Before 1991, people talked about multimedia, but the computing power, storage, bandwidth, and processing algorithms were not advanced enough to deal with audio and video. With the increasing availability and popularity of CDs, people became excited about creating documents that could include not only text, but also images, audio, and even video. That decade saw explosive growth in all aspects of hardware and software technology related to multimedia computing and communication. In the early 1990s, PC manufacturers labeled their high-end units containing advanced graphics multimedia PCs. That trend disappeared a few years later because every new computer became a multimedia computer.
Information about the environment is always obtained through sensors. To understand the handling of perceptual information, we must first start with an understanding of the types and properties of sensors and the nature of data they produce.
Types of Sensors
In general, a sensor is a device that measures a physical quantity and converts it into a signal that a human or a machine can use. Whether the sensor is human-made or from nature does not matter. Sensors for sound and light have been the most important for multimedia computing during the past decades because audio and video are best for communicating information for the tasks humans typically perform with or without a computer. That is, most people prefer to communicate through sound, and light serves illustrative purposes, supplementing the need for language-based description of a state of the world. New or different tasks might use different sensors, however. For example, in real-world dating (as opposed to current implementations of online dating), communication occurs on many other levels, such as scent, touch, and taste (e.g., when kissing). Artificial sensors are therefore invented as you read this chapter. Let’s start with a rough taxonomy of current sensors interesting to multimedia computing.
In Chapters 5 and 6, we described sound and light and their physical properties. In this chapter, we will discuss basic signal processing operations that are common initial steps of many algorithms for audio and video enhancement and content analysis.
Sampling and Quantization
As we explained in Chapter 4, a continuous function must be converted to a discrete form for representation and processing using a digital computer. The interface between the optical system that projects a scene onto the image plane and the computer must sample the image at a finite number of points and represent each sample within the finite word size of the computer. Likewise, the sound card samples the microphone output into a stream of numbers. In other words, in both cases, the matter to work with when doing computational audio and video processing is a stream of numbers that are representative of the signal at certain spatial or temporal points.