To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The formulas Eq. (1.33) of Chapter 1 represent the solution to the radiation problem in a non-dispersive medium governed by the wave equation; i.e., they give the radiated field u+(r, t) in terms of a known source q(r, t). These formulas were generalized to dispersive media in Chapter 2, where the radiation problem was solved directly in the frequency domain for a known source embedded in a uniform dispersive background medium. The inverse source problem (ISP), as its name indicates, is the inverse to the radiation problem, and in this problem one seeks the source q(r, t) from knowledge of its radiated field u+(r, t). The question of what applications require a solution to an inverse source problem naturally arises. There are basically two such applications that consist of (i) imaging (reconstructing) the interior of a volume source from observations of the field radiated by the source and (ii) designing a volume source to act as a multi-dimensional antenna to radiate a prescribed field. In the first application actual field measurements are employed, thereby generating data that are then used to “solve” the ISP and thus “reconstruct” the interior of the source, whereas in the second application desired field data are used to “design” a source that will generate those data. Regarding the ISP, the two applications are essentially identical, differing only in emphasis; in application (i) we have to contend with measurement error and noisy data, whereas in application (ii) we have to contend with inconsistencies between the desired data and the constraints required of the source (antenna).
In the radiation problem treated in Chapters 1 and 2 a “source” q(r, t) in the time domain or Q(r, ω) in the frequency domain radiated a wavefield that satisfied either the inhomogeneous wave equation in the time domain or the inhomogeneous Helmholtz equation in the frequency domain. In either case the solution to the radiation problem was easily obtained in the form of a convolution of the given source function with the causal Green function of the wave or Helmholtz equation. A key point concerning the radiation problem is that the source to the radiated field is assumed to be known (specified) and is assumed to be independent of the field that it radiates. Such sources are sometimes referred to as “primary” sources since the mechanism or process that created them is unknown or, at least, unimportant as regards the field that they radiate.
In this chapter we will also encounter the radiation problem, but with sources that are created by the interaction of a propagating wave incident on a physical obstacle or inhomogeneous region of space. These new types of sources are referred to as “induced” or “secondary” sources and the problem of computing the field that they radiate given the incident wave and a model for the field-obstacle interaction is called the scattering problem. We deal with two classes of scattering problem in this book: (i) scattering from so-called “penetrable” scatterers, where the incident wave penetrates into the interior of the obstacle so that the resulting induced source radiates as a conventional volume source of the type treated in earlier chapters; and (ii) scattering from non-penetrable scatterers, where the interaction of the incident wave with the obstacle occurs only over the object's surface.
I started this book roughly 20 years ago with the intention of producing a finished product within a year or so. But reality in the form of government research grants and “publish or perish” soon set in and so now, at long last, I have finally finished. The final product has of course changed significantly over these intervening years, both in content and in breadth. My original plan was to put together a six- or seven-chapter treatise on basic “Fourier-based” coherent imaging and diffraction tomography complete with Matlab codes implementing the imaging and inversion algorithms presented in the text. The current book certainly includes this material, but also includes a host of other material such as the chapter on time-reversal imaging and the four chapters on the propagation and scattering of waves in homogeneous and inhomogeneous backgrounds. More importantly, the “Fourier-based” inversion schemes originally used to develop much of coherent imaging and linearized inverse scattering (diffraction tomography) have been replaced by the much more powerful singular value decomposition (SVD). This approach allows virtually all of the linearized inverse problems associated with the wave and Helmholtz equation both in homogeneous and in inhomogeneous backgrounds to be treated in a uniform “turn the crank” manner.
My work on imaging and wavefield inversion began as a graduate student under Professor Emil Wolf at the University of Rochester. Originally I had intended to pursue my Ph.D. in quantum optics, but had my plans changed significantly by an off-hand remark by Professor Wolf during one of our meetings.
The “direct” or “forward” scattering problem was treated in the preceding two chapters, where the goal was the computation of a scattered field given knowledge of the scattering object and the incident wavefield. In the “inverse scattering problem” (ISCP) the goal is the determination of the scattering object given knowledge of the incident wave and the scattered wave over some restricted region of space. In Chapter 6 we treated so-called “penetrable” scatterers, where the incident wave penetrates into the interior of the obstacle, thus creating an “induced volume source” that then radiates as a conventional volume source of the type treated in earlier chapters. In Chapter 7 we treated non-penetrable scatterers, where the interaction of the incident wave with the obstacle occurs only over the object's surface. We also treated certain inverse problems associated with non-penetrable scatterers in that chapter that included inverse diffraction and the ISCP of determining the shape of a Dirichlet or Neumann scatterer from its scattering amplitude. In this chapter we will treat the ISCP for penetrable scatterers. We will also make the simplifying assumption that the scattering object is embedded in a uniform lossless medium. This assumption will be discarded in the next chapter, where we will treat scatterers embedded in non-uniform and dispersive media.
We pointed out in Chapter 5 that the difficulty of the “inverse source problem” (ISP) lies in the fact that the radiated field from which the source is to be determined is known only over space points that lie in some restricted region of space that is outside the support of the (unknown) source.
The Green-function solution to the radiation problem given in Eq. (2.23) of Chapter 2 represents this solution in terms of a superposition of outgoing spherical waves with each spherical wave being weighted by the source amplitude at that point. This solution was derived starting from the fact that the Helmholtz equation is linear and, hence, can be represented as a superposition of elementary solutions to the equation when excited by delta functions; i.e., as a convolution of the source term with a Green function that satisfies the same outgoing-wave condition, namely the Sommerfeld radiation condition (SRC), as is satisfied by the radiated field. Alternative representations of the field can also be obtained by making use of the linearity of the Helmholtz equation and the fact that the radiated field satisfies the homogeneous Helmholtz equation everywhere outside the source region τ0. In particular, as we have seen in the last chapter, it is possible to represent the field in such regions in terms of an expansion of eigenfunctions of the homogeneous Helmholtz equation such as the plane waves or multipole fields. Indeed, in Examples 3.3 and 3.5 of Chapter 3 we expanded outgoing-wave fields such as the radiated field in a plane-wave expansion and a multipole expansion, respectively, with the expansion coefficients (planewave amplitudes and multipole moments) determined directly from boundary values of the field. We continue with this task in this chapter, where we develop plane-wave and multipole expansions for the radiated field directly in terms of the source Q rather than in terms of the boundary value of the radiated field.
In this chapter we turn our attention to scattering from non-penetrable objects, or “surface scattering,” and “diffraction” from planar apertures. As was mentioned in the introduction to the previous chapter, the interaction of an incident wave with a non-penetrable scatterer occurs over the surface of the scattering obstacle and is thus defined by some type of boundary condition over this surface. In a similar vein diffraction of an incident wave from apertures cut into non-penetrable surfaces is also defined by some type of boundary condition over the aperture plus surface and thus can, in a certain sense, be considered to be a type of surface scattering. The formal solution to both types of problems is thus obtained in an identical fashion by converting the problem into a boundary-value problem, which is then easily solved using the theory developed in Chapter 2.
The above prescription for “solving” surface scattering and aperture diffraction problems has one missing ingredient: determination of the boundary values required in the solution of the scattering or diffraction problem. This is the ingredient that distinguishes a scattering or diffraction problem from the purely mathematical boundary-value problem. In this chapter we will restrict our attention to non-penetrable objects over which the total field (incident plus scattered) satisfies homogeneous Dirichlet or Neumann conditions. By invoking this condition it is possible to represent the scattered field in terms of either the value of the normal derivative of the total field (the homogeneous Dirichlet case) or the total field itself (the homogeneous Neumann case) over the scatterer surface.
We return to the problem of computing the field u+(r, t) radiated by a real-valued spaceand time-varying source q(r, t) embedded in an infinite homogeneous medium such as free space. As in Chapter 1 we will assume here that the time-dependent source q(r, t) is compactly supported in the space-time region {S0|r ϵ τ0, ϵ t ϵ [0, T0]}, where τ0 is its spatial volume and [0, T0] the interval of time over which the source is turned on. In the case in which the medium is non-dispersive the radiated wavefield satisfies the inhomogeneous scalar wave equation Eq. (1.1). More generally, if the background medium is dispersive it is necessary to replace the second time derivative in this equation by an integral (convolutional) operator, so that the wave equation is actually an integral-differential equation. In this chapter we will treat the radiation problem in the frequency domain so that this complication is avoided and our results apply both to dispersive and to non-dispersive backgrounds.
In addition to treating the radiation problem we also treat the classical boundary value problem for the scalar wave Helmholtz equation in a (possibly dispersive) uniform background medium. Special attention is devoted to the famous Rayleigh–Sommerfeld boundary-value problem, which consists of computing a radiated field throughout a half-space that is exterior to the source region τ0 from Dirichlet or Neumann conditions prescribed over an infinite bounding plane to the source.
The previous chapter (Chapter 11) explained how user requirements directed our development of meeting support technology, more specifically meeting browsers and assistants. Chapters 3 to 9 discussed the enabling components, i.e. the multimodal signal processing necessary to build meeting support technology. In the following, we will present an overview of the meeting browsers and assistants developed both in AMI and related projects, as well as outside this consortium.
Introduction
Face-to-face meetings are a key method by which organizations create and share knowledge, and the last 20 years have seen the development of new computational technology to support them.
Early research on meeting support technology focused on group decision support systems (Poole and DeSanctis, 1989), and on shared whiteboards and large displays to promote richer forms of collaboration (Mantei, 1988, Moran et al., 1998, Olson et al., 1992, Whittaker and Schwarz, 1995, Whittaker et al., 1999). There were also attempts at devising methods for evaluating these systems (Olson et al., 1992). Subsequent research was inspired by ubiquitous computing (Streitz et al., 1998, Yu et al., 2000), focusing on direct integration of collaborative computing into existing work practices and artifacts. While much of this prior work has addressed support for real-time collaboration by providing richer interaction resources, another important research area is interaction capture and retrieval.
Interaction capture and retrieval is motivated by the observation that much valuable information exchanged in workplace interactions is never recorded, leading people to forget key decisions or repeat prior discussions.
While the meeting setting creates many challenges just in terms of recognizing words and who is speaking them, once we have the words, there is still much to be done if the goal is to be able to understand the conversation. To do this, we need to be able to understand the language and the structure of the language being used.
The structure of language is multilayered. At a fine-grained, detailed level, we can look at the structure of the spoken utterances themselves. Dialogue acts which segment and label the utterances into units with one core intention are one type of structure at this level. Another way of looking at understanding language at this level is by focusing on the subjective language being used to express internal mental states, such as opinions, (dis-)agreement, sentiments, and uncertainty.
At a coarser level, language can be structured by the topic of conversation. Finally, within a given topic, there is a structure to the language used to make decisions. Language understanding is sufficiently advanced to capture the content of the conversation for specific phenomena like decisions based on elaborate domain models. This allows an indexing and summarization of meetings at a very high degree of understanding.
Finally, the language of spoken conversation differs significantly from written language. Frequent types of speech disfluencies can be detected and removed with techniques similar to those used for understanding language structure as described above.
Segmenting multi-party conversations into homogeneous speaker regions is a fundamental step towards automatic understanding of meetings. This information is used for multiple purposes as adaptation for speaker and speech recognition, as a meta-data extraction tool to navigate meetings, and also as input for automatic interaction analysis.
This task is referred to as speaker diarization and aims at inferring “who spoke when” in an audio stream involving two simultaneous goals: (1) the estimation of the number of speakers in an audio stream and (2) associating each speech segment with a speaker.
Diarization algorithms have been developed extensively for broadcast data, characterized by regular speaker turns, prompted speech, and high-quality audio, while processing meeting recordings presents different needs and additional challenges. From one side, the conversational nature of the speech involves very short turns and large amounts of overlapping speech; from the other side, the audio is acquired in a nonintrusive way using far-field microphones and is thus corrupted with ambient noise and reverberation. Furthermore real-time and online processing are often required in order to enable the use of many applications while the meeting is actually going on. The next section briefly reviews the state-of-the-art in the field.
State of the art in speaker diarization
Conventional speaker diarization systems are composed of the following steps: a feature extraction module that extracts acoustic features like mel-frequency cepstral coefficients (MFCCs) from the audio stream, a Speech/Non-speech Detection which extracts only the speech regions discarding silence, an optional speaker change module which divides the input stream into small homogeneous segments uttered by a single speaker, and an agglomerative hierarchical clustering step which groups together those speech segments into the same cluster.
The basic modeling problem begins with a set of observed data yn = {yt : t = 1, 2, …, n}, generated by some physical machinery, where the elements yt may be of any kind. Since no matter what they are they can be encoded as numbers we take them as such, i.e. natural numbers with or without the order if the data come from finite or countable sets, and real numbers otherwise. Often each number yt is observed together with others x1,t, x2,t, …, called explanatory data, written collectively as a K × n matrix X = {xi,j}, and the data then are written as yn ∣X. It is convenient to use the terminology “variables” for the source of these data. Hence, we say that the data {yt} come from the variable Y, and the explanatory data are generated by variables X1, X2, and so on.
In physics the explanatory data often determine the data yn of interest, called a “law,” but not so in statistical problems. Although by taking sufficiently many explanatory data we may also fit a function to the given set of observed data, but this is not a “law,” since if the same machinery were to generate additional data yn+1, x1,n+1, x2,n+1, … the function would not give yn+1. This is the reason the objective is to learn the statistical properties of the data yn, possibly in the context of the explanatory data.