To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Light is one of the most basic phenomena in the universe. The first words in the Bible are, “Let there be light!” A large part of the human brain is dedicated to translating the light reflected off of objects and onto our eyes to form an image of our surroundings. As discussed in Chapter 2, many human innovations have evolved around capturing and storing that image, mostly because of its use for communication purposes: first were the Stone Age cave painters; then followed the painters and sculptors of the Middle Ages and the Renaissance; then came photography, film, and digital storage of movies and photographs. Most recently, a computer science discipline evolved around computer-based interpretation of images, called computer vision. Recent years have brought rapid progress in the use of photography and movies through the popularity of digital cameras in cell phones. Many people now carry a device for capturing and sharing visual information and use it on a daily basis.
In this chapter, we introduce the basic properties of light and discuss how it is stored and reproduced. We examine basic image processing and introductory computer vision techniques in later chapters.
A major difference between multimedia data and most other data is its size. Images and audio files take much more space than text, for example. Video data is currently the single largest network bandwidth and hard disk space consumer. Compression was, therefore, among the first issues researchers in the emerging multimedia field sought to address. In fact, multimedia’s history is closely connected to different compression algorithms because they served as enabling technologies for many applications. Even today, multimedia signal processing would not be possible without compression methods. A Blu-ray disc can currently store 50 Gbytes, but a ninety-minute movie in 1,080p HDTV format takes about 800 Gbytes (without audio). So how does it fit on the disc? The answer to many such problems is compression.
This chapter discusses the underlying mathematical principles of compression algorithms, from the basics to advanced techniques. However, all the techniques outlined in this chapter belong to the family of lossless compression techniques; that is, the original data can be reconstructed bit by bit. Lossless compression techniques are applicable to all kinds of data, including non-multimedia data. However, these techniques are not always effective with all types of data. Therefore, subsequent chapters will introduce lossy compression techniques that are usually tailored to a specific type of data, for example, image or sound files.
As discussed in the previous chapter, multimedia is closely related to how humans experience the world. In this chapter, we first introduce the role of different sensory signals in human perception for understanding and functioning in various environments and for communicating and sharing experiences. A very important lesson for multimedia technologists is that each sense provides only partial information about the world. One sense alone, even the very powerful sense of vision, is not enough to understand the world. Data and information from different sensors must be combined with other senses and prior knowledge to understand the world – and even then we only obtain a partial model of the world. Therefore, different sensory modalities should be combined with other knowledge sources to interpret the situation. Multimedia computing and communication is fundamentally about combining information from multiple sources in the context of the problem being solved. This is what distinguishes multimedia from several other disciplines, including computer vision and audio processing, where the focus is on analyzing one medium to extract as much information as possible from it.
In multimedia systems, different types of data streams simultaneously exist, and the system must process them not as separate streams, but as one correlated set of streams that represent information and knowledge of interest for solving a problem. The challenge for a multimedia system is to discover correlations that exist in this set of multimedia data and combine partial information from disparate sources to build the holistic information in a given context.
Clearly everybody knows the word “multimedia,” yet when people think of it, they usually think of different things. For some people, multimedia equals entertainment. For other people, multimedia equals Web design. For many computer scientists, multimedia often means video in a computing environment. All these are narrow perspectives on multimedia. For example, visual information definitely dominates human activities because of the powerful visual machinery that we are equipped with. In the end, however, humans use all five senses effectively, opportunistically, and judiciously. Therefore, multimedia computing should utilize signals from multifarious sensors and present to users only the relevant information in the appropriate sensory modality.
This book takes an integrative systems approach to multimedia. Integrated multimedia systems receive input from different sensory and symbolic sources in different forms and representations. Users ideally access this information in experiential environments. Early techniques dealt with individual media more effectively than with integrated media and focused on developing efficient techniques for separate individual media, for example, MPEG video compression. During the past few years, issues that span multimedia have received more central attention. Many researchers now recognize that most of the difficult semantic issues become easier to solve when considering integrated multimedia rather than separate individual media.
So far, we have mostly described ideal and typical environments. In this chapter, we will discuss some issues that designers need to consider when building multimedia systems. We call this chapter “The Human Factor” because the content of this chapter deals with effects that can be observed when multimedia systems are exposed to human beings. Of course, ultimately, all computer systems are made to be used by us human beings.
Principles of User Interface Design
Most of today’s applications, especially ones that support multimedia in any way, use a graphical user interface (GUI), that is, an interface that is controlled through clicks, touch and/or gestures and that allows for the display of arbitrary image and video data. Therefore, knowing how to design GUI-based applications in a user-friendly manner is an important skill for everybody working in multimedia computing. Unfortunately, with the many factors that go into the behavior of a program and the perceptual requirements of the user, there is no unique path or definite set of guidelines to follow. Here is an example: Is it better to have the menu bar inside the window of an application, or is it better to have one menu bar that is always at the same place and changes with the application? As we assume the reader knows, this is one fundamental difference between Apple and Microsoft’s operating systems – and it is hard to say one or the other is right or wrong. However, some standards have evolved over many years, using research results and feedback from many users. These standards can be seen in many places today in desktop environments, smartphones, DVD players, and other devices.
As discussed in the previous chapter, hearing and vision are the two most important sensor inputs that humans have. Many parallels exist between visual signal processing and acoustic signal processing, but sound has unique properties – often complementary to those of visual signals. In fact, this is why nature gave animals both visual and acoustic sensors: to gather complementary and correlated information about the happenings in an environment. Many species use sound to detect danger, navigate, locate prey, and communicate. Virtually all physical phenomena – fire, rain, wind, surf, earthquake, and so on – produce unique sounds. Species of all kinds living on land, in the air, or in the sea have developed organs to produce specific sounds. In humans, these have evolved to produce singing and speech.
In this chapter, we introduce the basic properties of sound, sound production, and sound perception. More details of audio and audio processing are covered later in this book.
A multimedia computing system is designed to facilitate natural communication among people, that is, communication on the basis of perceptually encoded data. Such a system may be used synchronously or asynchronously for remote communication, using remote presence, or for facilitating better communication in the same environment. These interactions could also allow users to communicate with people across different time periods or to share knowledge gleaned over a long period. Video conferencing, video on demand, augmented reality, immersive video, or immersive telepresence systems represent different stages of technology enhancing natural communication environments. The basic goal of a multimedia system is to communicate information and experiences to other humans. Because humans sense the world using their five senses and communicate their experiences using these senses and their lingual abstractions of the world, a multimedia system should use the same senses and abstractions in communications.
Multimedia systems combine communication and computing systems. In multimedia systems, the notions of computing systems and communication systems basically become so intertwined that any efforts to distinguish them as computing and communications result in a difficult and meaningless exercise. In this chapter, we discuss basic components of a multimedia system. Where appropriate, differences from a traditional computing system will be pointed out explicitly along with the associated challenges.
While the compression techniques presented so far have assumed generic acoustic or visual content, this chapter presents lossy compression techniques especially designed for a particular type of acoustic data: human speech. Almost every human being on earth talks virtually every day – needless to say, there is a lot of captured digital speech content. Every movie or TV show contains an audio track, most of which usually consists of spoken language. The most important use of captured speech, however, is for communication, such as in cell phones, voice-over IP applications, or as part of video conferencing and meeting recordings. Most of the compression concepts discussed so far will also work on speech. The algorithms presented in this chapter were developed to achieve a higher compression ratio while preserving higher perceptual quality by exploiting speech-specific properties of the audio signal. We discussed human speech in Chapter 5. This chapter will directly dig into the algorithmic part using that knowledge.
Properties of a Speech Coder
As explained in Chapter 5, the properties of every sound are defined by the properties of the objects that create the sounds, by the environment that the sound waves travel in, and by the characteristics of the receiver and/or capturing device. The object that creates human speech is the vocal tract. Vocal tracts also exist in animals, such as birds or cats. As we all know, the sounds they produce differ substantially from average human speech, so creating a bird-sing compression or cat’s meow encoding algorithm would also be substantially different. The following algorithms all try to exploit the characteristics of speech and have very limited applicability to music or other nonspeech. However, all of them are of importance to multimedia computing because millions of people use them in everyday life.
Much of what is commonly known about South Africa centers on a single reality: this is probably one of the most racialized societies on earth. South African society is notorious for how, over the course of a century and a half, its economic, social, and political system systematically structured identities, interests, and institutions in that country on an explicitly racialized basis. The country is also rightly famous however for the courageous, multifaceted, and long-standing attempts to oppose that system, for the struggle to build a more just and free society, and for its apparently miraculous transition to a multiracial democracy in the mid 1990s. This struggle in turn has constructed an alternate set of identities, interests, and institutions. More recently, South Africa has continued to capture headlines for the challenges and difficulties associated with the attempts to consolidate democracy in a society that remains, socially and economically speaking, radically unequal – albeit now increasingly in terms of class rather than, as it was previously, in terms of race.
For much of the country’s history, the dominant rules of the game that governed both political and economic life were built around the deliberate construction of race – or attempts to erode and challenge it. From at least the 1700s until 1994, those rules overwhelmingly favored white settlers and their descendants while other South Africans were systematically excluded from access to political and economic power. Throughout, this exclusion was most forcefully directed at Africans. Racialized identities shaped how most South Africans came to identify their interests, often overriding (if never quite eliminating) the importance of other identities such as class and gender; they structured also the society’s key institutions, both formal and informal, political and economic. In particular, the racialization of politics built a very particular kind of state – one of the strongest and richest on the continent, but one that explicitly sought to meet the needs only of a small fraction of the society. This was especially true under the National Party (NP) government.
Japan is a fascinating country for political scientists to study. Japan’s interests, identities, and institutions have both been shaped by and played a significant role in shaping the global order.
Within a little more than a century from its opening to the West in 1853, Japan went from being an isolated feudal society to one of the world’s richest economies and most stable democracies. In the transition, Japan experienced a wide range of political forms, including a shogunate, imperial rule with an emperor supported (or controlled) by oligarchs, a kind of imperial democracy, military (some argue fascist) rule, and empire.
From 1945 until 1952, Japan was occupied by the U.S.-led Allied forces. The postwar Japanese Constitution, which was heavily inl uenced by the occupying forces, renounces war as a sovereign right of the nation and places sovereignty in the hands of the people. Japan’s subsequent political transition to democracy is among one of the most successful cases in the world of democratic consolidation. It is also a case of where democracy was largely imposed from outside. By the end of the Occupation, Japan was viewed as an important U.S. ally in the Pacii c . Japan’s security has been guaranteed by the U.S.-Japan Security Treaty ever since. Because Japan does not function as a “normal” state in the realpolitik sense of the word (that is, as a state that wields power through military strength), Japan has had to rely on economic and “sot ” power in its foreign relations. Some in Japan are eager to rewrite the constitution while others wish to retain its pacifist core.
Britain is widely regarded as having a political system that is a model for the rest of the world. It is a vigorously competitive democracy in which the rule of law is firmly established and individual freedoms are well protected. The constitutional order has been functioning for centuries, undisturbed by wars or revolutions. The experience of countries in the “third wave” of democratization, during the 1970s and 1980s, seems to confirm that parliamentary systems are more successful than presidential systems in reconciling conflicting interests in society and, hence, promoting less violence and greater stability.
However, the British system entered into a profound crisis in the 1970s, from which it has not yet emerged. Social changes have eroded the class structure that was the foundation of the two-party system. No party won a clear majority in the 2010 general election, resulting in the first coalition government in sixty-five years. Parliamentary sovereignty has been weakened by the need to conform to the laws of the European Union (EU), which Britain joined in 1973. Other important constitutional developments since the 1990s include a stronger role for the judiciary as a check on executive power, and the introduction of parliaments for Scotland and Wales for the first time in three hundred years. However, successive governments have been unable to come up with a viable plan to reform the unelected upper chamber of Parliament, the House of Lords. Britain finds itself headed into the twenty-first century with a system whose basic features were laid down in the nineteenth century.
China has one of the world’s most ancient civilizations, dating back more than 3,000 years. It is easy for political scientists studying China to emphasize its uniqueness, as Chinese culture, language, political thought, and history appear quite different from those of any of the major Western countries. Modern Chinese history was obviously punctuated with decisive Western impacts, but the way China responded to those impacts is often considered to be uniquely Chinese. Furthermore, Chinese political leaders themselves frequently stress that they represent movements that carry uniquely Chinese characteristics. China, it seems, can only be understood in its own light.
When put in a global and comparative context, however, China loses many of its unique features. Imperial China, or the Qing dynasty, was an agricultural empire when it met the first serious wave of challenges from the West during the middle of the nineteenth century. The emperor and the mandarins (high-ranking Chinese officials) were forced to give up their treasured institutions grudgingly after a series of humiliating defeats at the hands of the Westerners. This pattern resembled what occurred in many traditional political systems when confronted with aggression from the West. From that time on, the momentum for political development in China was driven by global competition and the need for national survival. China differed from other cases in the developing world mainly in the immense dimensions of the country, not in the nature of its response.
For most of the period since its independence in 1947, India has been led by the Congress Party. This party, descended from the independence movement once led by Mahatma Gandhi, has been generally committed to a secular, democratic Indian nation in which members of ethnic or religious groups would equally enjoy the benefits of citizenship. Congress’s main competitor, the Bharatiya Janata Party (BJP, or the Indian People’s Party), descended from political and social organizations committed to redefining Indian national identity in terms of Hinduttva (Hindu culture and civilization). Although these parties have been sharply critical of each other and have embraced quite different stances on identity politics, both have been committed to boosting India’s economic growth, facilitating India’s integration into global markets, and increasing India’s military capabilities and geopolitical influence. Although it was the BJP who was in power when India joined the “nuclear club” in 1998, the underground nuclear tests required years of preparation during the years when Congress had been in power. After 9/11, both BJP (until 2004) and Congress (from 2004 to 2014), despite some differences in rhetoric, supported cooperation with the United States in the “war on terror.” Although Congress was more attentive to the problems of rural poverty and food security, it was difficult to discern a major shift in economic policy or in the general commitment to liberalization when the Congress-led government took over in 2004. A decade later, mounting frustration over slower growth and persistent corruption helped the BJP and its allies trounce Congress on a platform of better governance and higher efficiency. There is little reason, however, to expect any dramatic changes in policy.