To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The concept of evolution is of fundamental importance in genomics and bioinformatics, as already emphasized in Chapter 1. Over the next three chapters we will be looking at questions related to the evolution of species, as well as the evolution of individual genes and proteins. First, we will specifically address the question of how humans are genetically different to other animals, such as our close relative the chimpanzee. At the level of bioinformatics and computational tools we will introduce multiple sequence alignments and show how to analyse such alignments with programming tools.
Genetic differences between humans and chimpanzees
Now that the complete genome sequences of humans and many other animals are known, we are in a position to address interesting questions about similarities and differences between animals in terms of genetic information. Humans clearly have functions that other animals seem to be lacking, such as advanced spoken and written languages and abstract-level thinking. Many cultural activities such as music and painting also seem to be specific to humans. We are definitely distinct from other animals in these respects, even when considering our close primate relatives. With the genome sequences now available, there is an obvious question to answer: what are the significant genetic differences between us and other primates such as the chimpanzee? It would be particularly interesting to identify the genetic elements that are likely to affect brain function. First of all, it should be kept in mind that the identity between the human and chimpanzee genomes is quite extensive. Thus, if we consider the portions of the human and chimpanzee genomes that may be aligned to each other, there are approximately 35 million single nucleotide differences (Chimpanzee Sequencing and Analysis Consortium, 2005). This is to say that in this respect the genomes are in fact 99% identical. In addition, there are about five million insertion/deletion events, accounting for another 3% difference between humans and chimpanzees.
One of the largest and most important parts of the original AMI project was the collection of a multimodal corpus that could be used to underpin the project research. The AMI Meeting Corpus contains 100 hours of synchronized recordings collected using special instrumented meeting rooms. As well as the base recordings, the corpus has been transcribed orthographically, and large portions of it have been annotated for everything from named entities, dialogue acts, and summaries to simple gaze and head movement behaviors. The AMIDA Corpus adds around 10 hours of recordings in which one person uses desktop videoconferencing to participate from a separate, “remote” location.
Many researchers think of these corpora simply as providing the training and test material for speech recognition or for one of the many language, video, or multimodal behaviors that they have been used to model. However, providing material for machine learning was only one of our concerns. In designing the corpus, we wished to ensure that the data was coherent, realistic, useful for some actual end applications of commercial importance, and equipped with high-quality annotations. That is, we set out to provide a data resource that might bias the research towards the basic technologies that would result in useful software components. In addition, we set out to create a resource that would be used not just by computationally oriented researchers, but by other disciplines as well. For instance, corpus linguists need naturalistic data for studying many different aspects of human communication.
Automatic summarization has traditionally concerned itself with textual documents. Research on that topic began in the late 1950s with the automatic generation of summaries for technical papers and magazine articles (Luhn, 1958). About 30 years later, summarization has advanced into the field of speech-based summarization, working on dialogues (Kameyama et al., 1996, Reithinger et al., 2000, Alexandersson, 2003) and multi-party interactions (Zechner, 2001b, Murray et al., 2005a, Kleinbauer et al., 2007).
Spärck Jones (1993, 1999) argues that the summarizing process can be described as consisting of three steps (Figure 10.1): interpretation (I), transformation (T), generation (G). The interpretation step analyzes the source, i.e., the input that is to be summarized, and derives from it a representation on which the next step, transformation, operates. In the transformation step the source content is condensed to the most relevant points. The final generation step verbalizes the transformation result into a summary document. This model is a high-level view on the summarization process that abstracts away from the details a concrete implementation has to face.
We distinguish between two general methods for generating an automatic summary: extractive and abstractive. The extractive approach generates a summary by identifying the most salient parts of the source and concatenating these parts to form the actual summary. For generic summarization, these salient parts are sentences that together convey the gist of the document's content. In that sense, extractive summarization becomes a binary decision process for every sentence from the source: should it be part of the summary or not? The selected sentences together then constitute the extractive summary, with some optional post-processing, such as sentence compression, as the final step.
The boy is in such indescribable pain day and night that no one from among
his closest relatives, though they do not spare themselves, has the strength
to bear looking after him too long, not to mention his mother, with her
chronically ill heart.… His attendant…after a few sleepless
nights filled with agony, becomes totally worn out and wouldn't be able to
take it at all.
(Steinberg and Khrustalöv, 1995)
These words are from a letter written by Yevgeny Botkin, the family doctor of the
Russian Tsar Nicholas II and his wife Alexandra (Steinberg and
Khrustalëv, 1995). The suffering boy is Alexei, the son of the couple,
born in 1904 and the successor to the throne of all Russians. He suffered from a
congenital bleeding disease known as haemophilia, in which the
defect is a missing or malfunctioning blood-clotting factor. The particular
haemophilia that affected Alexei is a genetic disease linked to the X
chromosome, and for this reason it is more prevalent in males. Females rarely
show symptoms of the disease but may carry the gene on to the next
generation.
Royal disease
Alexei inherited his disease gene from his great grandmother, Queen Victoria of
England (Fig. 12.1). In fact, the queen transmitted her gene to a number of
members of European royalty and thus decimated the thrones of Britain, Germany,
Russia and Spain. Typically the affected males are exceptionally sensitive to
injuries – for example, both Infante Gonzalo of Spain and Prince Rupert of Teck
died young as a result of bleeding following car accidents. The last carrier of
the disease in the royal family was Prince Waldemar of Prussia, who died in
1945.
In short, when the genome project was foundering in a sea of incompatible data formats, rapidly changing techniques, and monolithic data-analysis programs, Perl saved the day. Though it's not perfect, Perl seems to meet the needs of the genome centers remarkably well, and is usually the first tool we turn to when we have a problem to solve.
(Lincoln Stein, from ‘How Perl saved the genome project’)
The Perl programming language was invented by Larry Wall; his version 1.000 was presented in 1987. Perl is said not to be an acronym, but still you occasionally see it said to represent ‘Practical Extraction and Report Language’ or ‘Pathologically Eclectic Rubbish Lister’. The word ‘Perl’ (with a capital P) refers to the programming language as such, whereas ‘perl’ refers to the interpreter (implementation). Larry Wall originally invented the language to help out in system administration and in the analysis of huge text files, but through the years Perl has been continuously developed and has for many years been widely used in areas such as web programming and bioinformatics.
The information found in this appendix is very far from a complete description of Perl. The focus here is on the features of Perl that are mentioned in this book. There are certain important elements that are left out, such as variable references and object-oriented approaches. For more extensive information the reader is referred to other sources as listed towards the end of this appendix – for example, Perl Programming, for which Larry Wall is one of the coauthors.
The analysis of conversational dynamics in small groups, like the one illustrated in Figure 9.1, is a fundamental area in social psychology and nonverbal communication (Goodwin, 1981, Clark and Carlson, 1982). Conversational patterns exist at multiple time scales, ranging from knowing how and when to address or interrupt somebody, how to gain or hold the floor of a conversation, and how to make transitions in discussions. Most of these mechanisms are multimodal, involving multiple verbal and nonverbal cues for their display and interpretation (Knapp and Hall, 2006), and have an important effect on how people are socially perceived, e.g., whether they are dominant, competent, or extraverted (Knapp and Hall, 2006, Pentland, 2008).
This chapter introduces some of the basic problems related to the automatic understanding of conversational group dynamics. Using low-level cues produced by audio, visual, and audio-visual perceptual processing components like the ones discussed in previous chapters, here we present techniques that aim at answering questions like: Who are the people being addressed or looked at? Are the involved people attentive? What conversational state is a group conversation currently in? Is a particular person likely perceived as dominant based on how they interact? As shown later in the book, obtaining answers for these questions is very useful to infer, through further analysis, even higher-level aspects of a group conversation and its participants.
The chapter is organized as follows. Section 9.2 provides the basic definitions of three conversational phenomena discussed in this chapter: attention, turn-taking, and addressing.
Money has been spent. About 20 million Euros over six years through the European AMI and AMIDA projects, complemented by a number of satellite projects and national initiatives, including the large IM2 Swiss NSF National Center of Competence in Research. This book has provided a unique opportunity to review this research, and we conclude by attempting to make a fair assessment of what has been achieved compared to the initial vision and goals.
Our vision was to develop multimodal signal processing technologies to capture, analyze, understand, and enhance human interactions. Although we had the overall goal of modeling communicative interactions in general, we focused our efforts on enhancing the value of multimodal meeting recordings and on the development of real-time tools to enhance human interaction in meetings. We pursued these goals through the development of smart meeting rooms and new tools for computer-supported cooperative work and communication, and through the design of new ways to search and browse meetings.
The dominant multimodal research paradigm in the late 1990s was centered on the design of multimodal human-computer interfaces. In the AMI and AMIDA projects we switched the focus to multimodal interactions between people, partly as a way to develop more natural communicative interfaces for human-computer interaction. As discussed in Chapter 1, and similar to what was done in some other projects around the same time, our main idea was to put the computer within the human interaction loop (as explicitly referred to by the EU CHIL project), where computers are primarily used as a mediator to enhance human communication and collaborative potential.
This approach raised a number of major research challenges, while also offering application opportunities. Human communication is one of the most complex processes we know, characterized by fast, highly sensitive multimodal processing in which information is received and analyzed from multiple simultaneous inputs in real time, with little apparent effort.
Customarily, hypothesis testing is about the problem of finding out if some effect of interest, like the curing power of a drug, is supported by observed data, or if it is a random “fluke” rather than a real effect of the drug. To have a formal way to test this a certain probability hypothesis is set up to represent the effect. This can be done by selecting a function of the data, s(xn), called a test statistic, the distribution of which can be calculated under the hypothesis. Frequently parametric probability distributions have a special parameter value to represent randomness, like 1/2 in Bernoulli models for binary data and 0 for the mean of normal distributions, and the no effect case can be represented by the single model such as f(s(xn); θ0), called the null hypothesis. The test statistic is or should be in effect an estimator of the parameter, and if the data cause the test statistic to fall in the tail, into a so-called critical region of small probability, say 0.05, the null hypothesis is rejected, indicating by the double negation that the real effect is not ruled out. Fisher called the amount of evidence for the rejection of the null hypothesis the statistical significance on the selected level 0.05, which is frequently misused to mean that there is 95 percent statistical significance to accept the real effect.
In the two preceding chapters a number of diseases with a genetic background have been discussed. Many cancer diseases also have a genetic component; this will be the topic of this chapter. We will focus on a chromosomal aberration that gives rise to a specific type of cancer. From a bioinformatics perspective we will approach the problem of sequence alignments and make use of BLAST to study the effect of the chromosomal aberration.
Cancer as a genetic disease
Characteristic of cancer cells is that they divide in an uncontrolled manner (Hanahan and Weinberg, 2000, 2011). This behaviour may be achieved by an overproduction of proteins that stimulate cell growth, or by the inactivation of functions that normally restrict growth. Cancer is mostly a disease developed as a result of environmental and lifestyle factors. For instance, tobacco and obesity are two significant factors. However, there are also important genetic factors. There are changes present in germline (reproductive) cells which may result in an inherited predisposition to cancer; for example, there are certain inherited mutations in the BRCA1 and BRCA2 genes that are associated with an elevated risk of breast cancer. However, tumours are more commonly the result of mutations in somatic cells, i.e. cells that are not germline cells.
The word “information” has several meanings, the simplest of which has been formalized for communication by Hartley [17] as the logarithm of the number of elements in a finite set. Hence the information in the set A = {a, b, c, d} is log 4 = 2 (bits) in the base 2 logarithm. Only two types of logarithm are used in this book: the base 2 logarithm which we write simply as log, and the natural logarithm, written as ln. The amount of information in the set A = {a, b, c, d, e, f, g, h} is three bits, and so on. Hence such a formalization has nothing to do with any other meaning of the information communicated, such as the utility or quality. If we were asked to describe one element, say c in the second set, we could do it by saying that it is the third element, which could be done with the binary number 11 in both sets, but the element f in the second set would require three bits, namely 011. So we see that if the number of elements in a set is not a power of 2, we need either the maximum ∣log ∣A∣ number of bits or one less, as will be explained in the next section. Hence, we start seeing that “information,” relative to a set, could be formalized and measured by the shortest code length with which any element in a set could be described.
My little silver Honda's front tires pulled us through the mountains. My
hands felt the road and the turns. My mind drifted back into the lab. DNA
chains coiled and floated. Lurid blue and pink images of electric molecules
injected themselves somewhere between the mountain road and my eyes.
(Kary Mullis about his invention of PCR in Dancing Naked in the
Mind Field; Mullis, 1998)
In order to work with DNA in the laboratory we often need to produce that DNA in
a larger quantity. In addition, a longer DNA molecule like a whole human
chromosome is difficult to work with, and we rather want to focus on a smaller
region of DNA. In a classic cloning experiment a smaller DNA fragment is
introduced into a circular plasmid DNA, which is allowed to replicate within
bacterial cells. Once the bacterial cells have grown to a certain density, these
cells may be isolated by centrifugation and the plasmid DNA purified from the
bacterial cells. This is a somewhat time-consuming procedure, but there is an
alternative method in gene technology that is more convenient. Thus, our third
example of gene technology is the polymerase chain reaction
(PCR), invented by Kary Mullis in 1984 (Mullis et al.,
1986).
What is PCR?
PCR is one of the most widely used techniques in DNA technology. As in
traditional cloning it is a means to amplify a shorter region of DNA, i.e. to
produce a distinct region of DNA in a large quantity. However, PCR is carried
out in a simple test tube reaction where DNA is amplified with the help of the
enzyme DNA polymerase. Another technical advantage of PCR compared to
conventional cloning in some bacterial host is that the DNA product is
relatively pure.
A primary consideration when designing a room or system for meeting data capture is of course how to best capture audio of the conversation. Technology systems requiring voice input have traditionally relied on close-talking microphones for signal acquisition, as they naturally provide a higher signal-to-noise ratio (SNR) than single distant microphones. This mode of acquisition may be acceptable for applications such as dictation and single-user telephony, however as technology heads towards more pervasive applications, less constraining solutions are required to capture natural spoken interactions.
In the context of group interactions in meeting rooms, microphone arrays (or more generally, multiple distant microphones) present an important alternative to close-talking microphones. By enabling spatial filtering of the sound field, arrays allow for location-based speech enhancement, as well as automatic localization and tracking of speakers. The primary benefit of this is to enable non-intrusive hands-free operation: that is, users are not constrained to wear headset or lapel microphones, nor do they need to speak directly into a particular fixed microphone.
Beyond just being a front-end voice enhancement method for automatic speech recognition, the audio localization capability of arrays offers an important cue that can be exploited in systems for speaker diarization, joint audio-visual person tracking, analysis of conversational dynamics, as well as in user interface elements for browsing meeting recordings.
For all these reasons, microphone arrays have become an important enabling technology in academic and commercial research projects studying multimodal signal processing of human interactions over the past decade (three examples of products are shown in Figure 3.1).