To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we will discuss inheritance within a family and address questions about the parental origin of specific sites in a genome. First we need to understand a few basic elements of genetics. All human cells contain two copies of each chromosome, one of parental and one of maternal origin. Exceptions are the reproductive cells or gametes, i.e. eggs in females and sperms in males, which have only one copy of each chromosome. The gametes are formed in the process of meiosis (Figs. 20.1 and 20.2). Meiosis starts out with a cell having two copies of each chromosome (one from the father and one from the mother; two such chromosomes are said to be homologous). The DNA is first copied in order to generate two copies each of the paternal and maternal chromosomes. Through crossing-over, which occurs by homologous recombination, segments are occasionally interchanged between parental and maternal chromosomes (Fig. 20.2). Such recombination occurs about 50–100 times during meiosis, and there is evidence that in humans the recombination in maternal meiosis is about 1.7 times more frequent than in paternal meiosis (Petkov et al., 2007; Roach et al., 2010).
This book is an introduction to multimodal signal processing. In it, we use the goal of building applications that can understand meetings as a way to focus and motivate the processing we describe. Multimodal signal processing takes the outputs of capture devices running at the same time – primarily cameras and microphones, but also electronic whiteboards and pens – and automatically analyzes them to make sense of what is happening in the space being recorded. For instance, these analyses might indicate who spoke, what was said, whether there was an active discussion, and who was dominant in it. These analyses require the capture of multimodal data using a range of signals, followed by a low-level automatic annotation of them, gradually layering up annotation until information that relates to user requirements is extracted.
Multimodal signal processing can be done in real time, that is, fast enough to build applications that influence the group while they are together, or offline – not always but often at higher quality – for later review of what went on. It can also be done for groups that are all together in one space, typically an instrumented meeting room, or for groups that are in different spaces but use technology such as videoconferencing to communicate. The book thus introduces automatic approaches to capturing, processing, and ultimately understanding human interaction in meetings, and describes the state of the art for all technologies involved.
We begin with the description of the most basic and primitive codes. Let A = a1, …, ak be a finite set: the set is called an alphabet, and its elements are called symbols. The main interest is in the set of finite sequences sn = aiaj … of the symbols of some length n, called messages or for us just data. The problem is to send or store them in a manner that costs the sending device little time and storage space. Again for practical reasons these devices use binary symbols 0 and 1, while the original symbols are represented as a sequence of binary symbols, such as the eight bits long “bytes.” Let for each symbol a in A, C: a ↦ C(a) be a one-to-one map, called a code, from the alphabet into the set of binary strings. It is extended to the messages by concatenation C : aiaj … an ↦ C(ai)C(aj) … C(an). Both binary strings C(a) and C(ai)C(aj) … C(an) are called codewords.
This will give us an upside down binary tree, with the root on top, whose nodes are the codewords, first of the symbols for n = 1 and then of the length 2 messages, and so on. The left hand tree in Figure 2.1 illustrates the code for the symbols of the alphabet A = {a, b, c}.
The concept of evolution is of fundamental importance in genomics and bioinformatics, as already emphasized in Chapter 1. Over the next three chapters we will be looking at questions related to the evolution of species, as well as the evolution of individual genes and proteins. First, we will specifically address the question of how humans are genetically different to other animals, such as our close relative the chimpanzee. At the level of bioinformatics and computational tools we will introduce multiple sequence alignments and show how to analyse such alignments with programming tools.
Genetic differences between humans and chimpanzees
Now that the complete genome sequences of humans and many other animals are known, we are in a position to address interesting questions about similarities and differences between animals in terms of genetic information. Humans clearly have functions that other animals seem to be lacking, such as advanced spoken and written languages and abstract-level thinking. Many cultural activities such as music and painting also seem to be specific to humans. We are definitely distinct from other animals in these respects, even when considering our close primate relatives. With the genome sequences now available, there is an obvious question to answer: what are the significant genetic differences between us and other primates such as the chimpanzee? It would be particularly interesting to identify the genetic elements that are likely to affect brain function. First of all, it should be kept in mind that the identity between the human and chimpanzee genomes is quite extensive. Thus, if we consider the portions of the human and chimpanzee genomes that may be aligned to each other, there are approximately 35 million single nucleotide differences (Chimpanzee Sequencing and Analysis Consortium, 2005). This is to say that in this respect the genomes are in fact 99% identical. In addition, there are about five million insertion/deletion events, accounting for another 3% difference between humans and chimpanzees.
One of the largest and most important parts of the original AMI project was the collection of a multimodal corpus that could be used to underpin the project research. The AMI Meeting Corpus contains 100 hours of synchronized recordings collected using special instrumented meeting rooms. As well as the base recordings, the corpus has been transcribed orthographically, and large portions of it have been annotated for everything from named entities, dialogue acts, and summaries to simple gaze and head movement behaviors. The AMIDA Corpus adds around 10 hours of recordings in which one person uses desktop videoconferencing to participate from a separate, “remote” location.
Many researchers think of these corpora simply as providing the training and test material for speech recognition or for one of the many language, video, or multimodal behaviors that they have been used to model. However, providing material for machine learning was only one of our concerns. In designing the corpus, we wished to ensure that the data was coherent, realistic, useful for some actual end applications of commercial importance, and equipped with high-quality annotations. That is, we set out to provide a data resource that might bias the research towards the basic technologies that would result in useful software components. In addition, we set out to create a resource that would be used not just by computationally oriented researchers, but by other disciplines as well. For instance, corpus linguists need naturalistic data for studying many different aspects of human communication.
Automatic summarization has traditionally concerned itself with textual documents. Research on that topic began in the late 1950s with the automatic generation of summaries for technical papers and magazine articles (Luhn, 1958). About 30 years later, summarization has advanced into the field of speech-based summarization, working on dialogues (Kameyama et al., 1996, Reithinger et al., 2000, Alexandersson, 2003) and multi-party interactions (Zechner, 2001b, Murray et al., 2005a, Kleinbauer et al., 2007).
Spärck Jones (1993, 1999) argues that the summarizing process can be described as consisting of three steps (Figure 10.1): interpretation (I), transformation (T), generation (G). The interpretation step analyzes the source, i.e., the input that is to be summarized, and derives from it a representation on which the next step, transformation, operates. In the transformation step the source content is condensed to the most relevant points. The final generation step verbalizes the transformation result into a summary document. This model is a high-level view on the summarization process that abstracts away from the details a concrete implementation has to face.
We distinguish between two general methods for generating an automatic summary: extractive and abstractive. The extractive approach generates a summary by identifying the most salient parts of the source and concatenating these parts to form the actual summary. For generic summarization, these salient parts are sentences that together convey the gist of the document's content. In that sense, extractive summarization becomes a binary decision process for every sentence from the source: should it be part of the summary or not? The selected sentences together then constitute the extractive summary, with some optional post-processing, such as sentence compression, as the final step.
The boy is in such indescribable pain day and night that no one from among
his closest relatives, though they do not spare themselves, has the strength
to bear looking after him too long, not to mention his mother, with her
chronically ill heart.… His attendant…after a few sleepless
nights filled with agony, becomes totally worn out and wouldn't be able to
take it at all.
(Steinberg and Khrustalöv, 1995)
These words are from a letter written by Yevgeny Botkin, the family doctor of the
Russian Tsar Nicholas II and his wife Alexandra (Steinberg and
Khrustalëv, 1995). The suffering boy is Alexei, the son of the couple,
born in 1904 and the successor to the throne of all Russians. He suffered from a
congenital bleeding disease known as haemophilia, in which the
defect is a missing or malfunctioning blood-clotting factor. The particular
haemophilia that affected Alexei is a genetic disease linked to the X
chromosome, and for this reason it is more prevalent in males. Females rarely
show symptoms of the disease but may carry the gene on to the next
generation.
Royal disease
Alexei inherited his disease gene from his great grandmother, Queen Victoria of
England (Fig. 12.1). In fact, the queen transmitted her gene to a number of
members of European royalty and thus decimated the thrones of Britain, Germany,
Russia and Spain. Typically the affected males are exceptionally sensitive to
injuries – for example, both Infante Gonzalo of Spain and Prince Rupert of Teck
died young as a result of bleeding following car accidents. The last carrier of
the disease in the royal family was Prince Waldemar of Prussia, who died in
1945.
In short, when the genome project was foundering in a sea of incompatible data formats, rapidly changing techniques, and monolithic data-analysis programs, Perl saved the day. Though it's not perfect, Perl seems to meet the needs of the genome centers remarkably well, and is usually the first tool we turn to when we have a problem to solve.
(Lincoln Stein, from ‘How Perl saved the genome project’)
The Perl programming language was invented by Larry Wall; his version 1.000 was presented in 1987. Perl is said not to be an acronym, but still you occasionally see it said to represent ‘Practical Extraction and Report Language’ or ‘Pathologically Eclectic Rubbish Lister’. The word ‘Perl’ (with a capital P) refers to the programming language as such, whereas ‘perl’ refers to the interpreter (implementation). Larry Wall originally invented the language to help out in system administration and in the analysis of huge text files, but through the years Perl has been continuously developed and has for many years been widely used in areas such as web programming and bioinformatics.
The information found in this appendix is very far from a complete description of Perl. The focus here is on the features of Perl that are mentioned in this book. There are certain important elements that are left out, such as variable references and object-oriented approaches. For more extensive information the reader is referred to other sources as listed towards the end of this appendix – for example, Perl Programming, for which Larry Wall is one of the coauthors.
The analysis of conversational dynamics in small groups, like the one illustrated in Figure 9.1, is a fundamental area in social psychology and nonverbal communication (Goodwin, 1981, Clark and Carlson, 1982). Conversational patterns exist at multiple time scales, ranging from knowing how and when to address or interrupt somebody, how to gain or hold the floor of a conversation, and how to make transitions in discussions. Most of these mechanisms are multimodal, involving multiple verbal and nonverbal cues for their display and interpretation (Knapp and Hall, 2006), and have an important effect on how people are socially perceived, e.g., whether they are dominant, competent, or extraverted (Knapp and Hall, 2006, Pentland, 2008).
This chapter introduces some of the basic problems related to the automatic understanding of conversational group dynamics. Using low-level cues produced by audio, visual, and audio-visual perceptual processing components like the ones discussed in previous chapters, here we present techniques that aim at answering questions like: Who are the people being addressed or looked at? Are the involved people attentive? What conversational state is a group conversation currently in? Is a particular person likely perceived as dominant based on how they interact? As shown later in the book, obtaining answers for these questions is very useful to infer, through further analysis, even higher-level aspects of a group conversation and its participants.
The chapter is organized as follows. Section 9.2 provides the basic definitions of three conversational phenomena discussed in this chapter: attention, turn-taking, and addressing.
Money has been spent. About 20 million Euros over six years through the European AMI and AMIDA projects, complemented by a number of satellite projects and national initiatives, including the large IM2 Swiss NSF National Center of Competence in Research. This book has provided a unique opportunity to review this research, and we conclude by attempting to make a fair assessment of what has been achieved compared to the initial vision and goals.
Our vision was to develop multimodal signal processing technologies to capture, analyze, understand, and enhance human interactions. Although we had the overall goal of modeling communicative interactions in general, we focused our efforts on enhancing the value of multimodal meeting recordings and on the development of real-time tools to enhance human interaction in meetings. We pursued these goals through the development of smart meeting rooms and new tools for computer-supported cooperative work and communication, and through the design of new ways to search and browse meetings.
The dominant multimodal research paradigm in the late 1990s was centered on the design of multimodal human-computer interfaces. In the AMI and AMIDA projects we switched the focus to multimodal interactions between people, partly as a way to develop more natural communicative interfaces for human-computer interaction. As discussed in Chapter 1, and similar to what was done in some other projects around the same time, our main idea was to put the computer within the human interaction loop (as explicitly referred to by the EU CHIL project), where computers are primarily used as a mediator to enhance human communication and collaborative potential.
This approach raised a number of major research challenges, while also offering application opportunities. Human communication is one of the most complex processes we know, characterized by fast, highly sensitive multimodal processing in which information is received and analyzed from multiple simultaneous inputs in real time, with little apparent effort.
Customarily, hypothesis testing is about the problem of finding out if some effect of interest, like the curing power of a drug, is supported by observed data, or if it is a random “fluke” rather than a real effect of the drug. To have a formal way to test this a certain probability hypothesis is set up to represent the effect. This can be done by selecting a function of the data, s(xn), called a test statistic, the distribution of which can be calculated under the hypothesis. Frequently parametric probability distributions have a special parameter value to represent randomness, like 1/2 in Bernoulli models for binary data and 0 for the mean of normal distributions, and the no effect case can be represented by the single model such as f(s(xn); θ0), called the null hypothesis. The test statistic is or should be in effect an estimator of the parameter, and if the data cause the test statistic to fall in the tail, into a so-called critical region of small probability, say 0.05, the null hypothesis is rejected, indicating by the double negation that the real effect is not ruled out. Fisher called the amount of evidence for the rejection of the null hypothesis the statistical significance on the selected level 0.05, which is frequently misused to mean that there is 95 percent statistical significance to accept the real effect.
In the two preceding chapters a number of diseases with a genetic background have been discussed. Many cancer diseases also have a genetic component; this will be the topic of this chapter. We will focus on a chromosomal aberration that gives rise to a specific type of cancer. From a bioinformatics perspective we will approach the problem of sequence alignments and make use of BLAST to study the effect of the chromosomal aberration.
Cancer as a genetic disease
Characteristic of cancer cells is that they divide in an uncontrolled manner (Hanahan and Weinberg, 2000, 2011). This behaviour may be achieved by an overproduction of proteins that stimulate cell growth, or by the inactivation of functions that normally restrict growth. Cancer is mostly a disease developed as a result of environmental and lifestyle factors. For instance, tobacco and obesity are two significant factors. However, there are also important genetic factors. There are changes present in germline (reproductive) cells which may result in an inherited predisposition to cancer; for example, there are certain inherited mutations in the BRCA1 and BRCA2 genes that are associated with an elevated risk of breast cancer. However, tumours are more commonly the result of mutations in somatic cells, i.e. cells that are not germline cells.