Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-08T17:57:04.240Z Has data issue: false hasContentIssue false

Digital acoustics: processing wave fields in space and time using DSP tools

Published online by Cambridge University Press:  22 December 2014

Francisco Pinto*
Affiliation:
School for Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
Mihailo Kolundžija
Affiliation:
School for Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
Martin Vetterli
Affiliation:
School for Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
*
Corresponding author: Francisco Pinto Email: francisco.pinto@epfl.ch

Abstract

Systems with hundreds of microphones for acoustic field acquisition, or hundreds of loudspeakers for rendering, have been proposed and built. To analyze, design, and apply such systems requires a framework that allows us to leverage the vast set of tools available in digital signal processing in order to achieve intuitive and efficient algorithms. We thus propose a discrete space–time framework, grounded in classical acoustics, which addresses the discrete nature of the spatial and temporal sampling. In particular, a short-space/time Fourier transform is introduced, which is the natural extension of the localized or short-time Fourier transform. Processing in this intuitive domain allows us to easily devise algorithms for beam-forming, source separation, and multi-channel compression, among other useful tasks. The essential space band-limitedness of the Fourier spectrum is also used to solve the spatial equalization task required for sound field rendering in a region of interest. Examples of applications are shown.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2014
Figure 0

Fig. 1. An illustration of the three physical principles: (i) a point source generates a spherical wave front, which becomes increasingly flat in the far-field; (ii) as the distance increases, the ratio between evanescent energy (E) and propagating energy (P) decays to zero; (iii) the Huygens principle implies that the wave front is a continuum of secondary sources that generate every “step” in its propagation.

Figure 1

Fig. 2. 2D Fourier transform of a Dirac source in the near-field (top), and a sinusoidal source and a Dirac source in the far-field (bottom). ϕ represents the spatial frequency along the x-axis, and the third dimension is the magnitude of the sound pressure as a function of spatial and temporal frequencies.

Figure 2

Fig. 3. Effects of sampling in space and time: (a) non-band-limited spectrum; (b) band-limited spectrum; (c) (aliasing-free) temporal sampling; (d) (aliasing-free) spatial sampling; (e) temporal aliasing; (f) spatial aliasing.

Figure 3

Fig. 4. Windowed Fourier transform of a Dirac source in the far-field (top) and the near-field (bottom). The triangular pattern opens as the source gets closer to the microphone array, closes as the source gets farther away, and skews left and right according to the minimum and maximum angles of incidence of the wave front. The ripples on the outside of the triangle are caused by the sinc-function effect of spatial windowing, and are directed toward the average direction of incidence of the wave front.

Figure 4

Fig. 5. Typical structure of a multidimensional filter bank. (a) The filter bank structure is similar to the 1D case, except that the filters and rate converters are multidimensional. The z-transform vector is defined such that, in the 2D spatiotemporal domain, z = (zx,zt), and N is a diagonal resampling matrix given by ${\bf N} = \left[\matrix{N_x & 0 \cr 0 & N_t} \right]$. The number of filters is determined by the size of the space of coset vectors ${\open K}^{2} \subset {\open Z}^{2}$ (assuming m = 2 from the figure), which is essentially the space of all combinations of integer vectors ${\bf k} = \left[\matrix{k_{x}\cr k_{t}} \right]$ from ${\bf k} = \left[\matrix{0 \cr 0} \right]$ to ${\bf k} = \left[\matrix{N_{x}-1\cr N_{t}-1}\right]$. (b) The equivalent polyphase representation is characterized by a delay chain composed of vector delay factors zk = zxkxztkt and the resampling matrix N, which generate 2D sample blocks of size Nx × Nt from the input signal and vice versa. If the filter bank is separable, the filtering operations can be expressed as a product between transform matrices associated with each dimension.

Figure 5

Fig. 6. Far-field and intermediate-field sources driven by a Dirac pulse, observed on a linear microphone array. (a) Acoustic scene; (b) spatiotemporal signal p[n]; (c) spatiotemporal DFT P[b]; (d) short spatiotemporal Fourier transform Pi[b].

Figure 6

Fig. 7. Intermediate-field source driven by a Dirac pulse and observed on a curved microphone array. (a) Acoustic scene; (b) spatiotemporal signal p[n]; (c) spatiotemporal DFT P[b]; (d) short spatiotemporal Fourier transform Pi[b].

Figure 7

Fig. 8. Example of filtering directly in the spatiotemporal Fourier domain (with no lapped transform). (a) The acoustic scene consists of two Dirac sources in the intermediate-field. The goal is to suppress the dashed source. (b) DFT along the entire spatial axis. (c) Filter input. (d) Filter output.

Figure 8

Fig. 9. Example of filtering directly in the spatiotemporal Fourier domain (with no lapped transform). (a) The acoustic scene consists of two Dirac sources, where one is in the intermediate-field and the other in the far-field. The goal is to suppress the dashed source. (b) DFT along the entire spatial axis. (c) Filter input. (d) Filter output.

Figure 9

Fig. 10. Example of filtering in the short spatiotemporal Fourier domain. (a) The acoustic scene consists of two Dirac sources in the intermediate-field. The goal is to suppress the dashed source. (b) DFT along the entire spatial axis. (c) Filter input. (d) Filter output. (e) Short spatiotemporal Fourier transform.

Figure 10

Fig. 11. Example of filtering in the short spatiotemporal Fourier domain. (a) The acoustic scene consists of three Dirac sources in the intermediate-field, and a curved microphone array. The goal is to suppress the dashed sources. (b) DFT along the entire spatial axis. (c) Filter input. (d) Filter output. (e) Short spatiotemporal Fourier transform.

Figure 11

Fig. 12. Design steps of a spatiotemporal filter. The pass-band region of the filter should enclose the triangular pattern that characterizes the spectrum of a point source.

Figure 12

Fig. 13. Experimental rate-distortion curves for white-noise sources in the far-field observed on a straight line. On the left, the D(R) curves are shown for one source encoded in the short spatiotemporal Fourier domain (Gabor domain) with no overlapping. The source is fixed at $\alpha = \displaystyle{{\pi}\over{3}}$ and the number of spatial points Nx is variable. On the right, the D(R) curves are shown for multiple sources encoded in the short spatiotemporal Fourier domain. The number of spatial points is fixed, Nx = 64, and the sources are placed at random angles. The black circle shown in each plot indicates R = 2.6 bits/time-sample, which is the average rate of state-of-art perceptual coders.

Figure 13

Fig. 14. Sound field reproduction through MIMO acoustic channel inversion problem overview.

Figure 14

Fig. 15. An illustration of sound field reproduction with an array of loudspeakers, the listening area being to the right (in front) of the loudspeaker array. Figures on the left show snapshot of desired sound fields, while figures on the right show the corresponding sound fields reproduced with loudspeaker arrays. The desired sound field emanates from a point source (a) behind, with rs = (0 m, 1 m) and (b) in front of the loudspeaker array, with rs = (2 m, 1.5 m). It is apparent that in the listening area, sound fields reproduced with loudspeaker arrays match well the corresponding desired sound fields.

Figure 15

Fig. 16. The principle of reproducing sound fields in the half-space z > 0 using a planar distribution of secondary point sources in the xy-plane.