To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We briefly review the mathematics in the coding engine of JPEG 2000, a state-of-the-art image compression system. We focus in depth on the transform, entropy coding and bitstream assembler modules. Our goal is to present a general overview of the mathematics underlying a state of the art scalable image compression technology.
1. Introduction
Data compression is a process that creates a compact data representation from a raw data source, usually with an end goal of facilitating storage or transmission. Broadly speaking, compression takes two forms, either loss less or lossy, depending on whether or not it is possible to reconstruct exactly the original datastream from its compressed version. For example, a data stream that consists of long runs of Os and Is (such as that generated by a black and white fax) would possibly benefit from simple run-length encoding, a lossless technique replacing the original datastream by a sequence of counts of the lengths of the alternating substrings of Os and Is. Lossless compression is necessary for situations in which changing a single bit can have catastrophic effects, such as in machine code of a computer program.
While it might seem as though we should always demand lossless compression, there are in fact many venues where exact reproduction is unnecessary. In particular, media compression, which we define to be the compression of image, audio, or video files, presents an excellent opportunity for lossy techniques. For example, not one among us would be able to distinguish between two images which differ in only one of the 229 bits in a typical 1024 x 1024 color image.
In this paper we investigate quadrature rules for functions on compact Lie groups and sections of homogeneous vector bundles associated with these groups. First a general notion of band-limitedness is introduced which generalizes the usual notion on the torus or translation groups. We develop a sampling theorem that allows exact computation of the Fourier expansion of a band-limited function or section from sample values and quantifies the error in the expansion when the function or section is not band-limited. We then construct specific finitely supported distributions on the classical groups which have nice error properties and can also be used to develop efficient algorithms for the computation of Fourier transforms on these groups.
1. Introduction
The Fourier transform of a function on a compact Lie group computes the coefficients (Fourier coefficients) that enable its expression as a linear combination of the matrix elements from a complete set of irreducible representations of the group. In the case of abelian groups, especially the circle and its lower dimensional products (tori) this is precisely the expansion of a function on these domains in terms of complex exponentials. This representation is at the heart of classical signal and image processing (see [25; 26], for example).
The successes of abelian Fourier analysis are many, ranging from national defense to personal entertainment, from medicine to finance. The record of achievements is so impressive that it has perhaps sometimes led scientists astray, seducing them to look for ways to use these tools in situations where they are less than appropriate: for example, pretending that a sphere is a torus so as to avoid the use of spherical harmonics in favor of Fourier series - a favored mathematical hammer casting the multitudinous problems of science as a box of nails.
This article presents a simple version of Integrated Sensing and Processing (ISP) for statistical pattern recognition wherein the sensor measureIllents to be taken are adaptively selected based on task-specific metries. Thus the measurement space in which the pattern recognition task is ultimately addressed integrates adaptive sensor technology with the specific task for which the sensor is employed. This end-to-end optimization of sensor/ processor/exploitation subsystems is a theme of the DARPA Defense Sciences Office Applied and Computational Mathematics Program's ISP program. We illustrate the idea with a pedagogical example and application to the HyMap hyperspectral sensor and the Tufts University “artificial nose” chemical sensor.
1. Introduction
An important activity, common to many fields of endeavor, is the act of refining high order information (detections of events, classification of objects, identification of activities, etc.) from large volumes of diverse data which is increasingly available through modern means of measurement, communication, and processing. This exploitation function winnows the available data concerning an object or situation in order to extract useful and actionable information, quite often through the application of techniques from statistical pattern recognition to the data. This may involve activities like detection, identification, and classification which are applied to the raw measured data, or possibly to partially processed information derived from it.
When new data are sought in order to obtain information about a specific situation, it is now increasingly common to have many different measurement degrees of freedom potentially available for the task.
The classical (scalar-valued) theory of spherical functions, put forward by Cartan and others, unifies under one roof a number of examples that were very well-known before the theory was formulated. These examples include special functions such as like Jacobi polynomials, Bessel functions, Laguerre polynomials, Hermite polynomials, Legendre functions, which had been workhorses in many areas of mathematical physics before the appearance of a unifying theory. These and other functions have found interesting applications in signal processing, including specific areas such as medical imaging.
The theory of matrix-valued spherical functions is a natural extension of the well-known scalar-valued theory. Its historical development, however, is different: in this case the theory has gone ahead of the examples. The purpose of this article is to point to some examples and to interest readers in this new aspect in the world of special functions.
We close with a remark connecting the functions described here with the theory of matrix-valued orthogonal polynomials.
1. Introduction and Statement of Results
The theory of matrix-valued spherical functions (see [GV; T]) gives a natural extension of the well-known theory for the scalar-valued case, see [He]. We start with a few remarks about the scalar-valued case.
The classical (scalar-valued) theory of spherical functions (put forward by Cartan and others after him) allows one to unify under one roof a number of examples that were very well known before the theory was formulated. These examples include many special functions like Jacobi polynomials, Bessel functions, Laguerre polynomials, Hermite polynomials, Legendre functions, etc.
This paper addresses some of the fundamental problems which have to be solved in order for optical networks to utilize the full bandwidth of optical fibers. It discusses some of the premises for signal processing in optical fibers. It gives a short historical comparison between the development of transmission techniques for radio and microwaves to that of optical fibers. There is also a discussion of bandwidth with a particular emphasis on what physical interactions limit the speed in optical fibers. Finally, there is a section on line codes and some recent developments in optical encoding of wavelets.
1. Introduction
When Claude Shannon developed the mathematical theory of communication [1] he knew nothing about lasers and optical fibers. What he was mostly concerned with were communication channels using radio- and microwaves. Inherently, these channels have a narrower bandwidth than do optical fibers because of the lower carrier frequency (longer wavelength). More serious than this theoretical limitation are the practical bandwidth limitations imposed by weather and other environmental hazards. In contrast, optical fibers are a marvellously stable and predictable medium for transporting information and the influence of noise from the fiber itself can to a large degree be neglected. So, until recently there was no real need for any advanced signal processing in optical fiber communications systems. This has all changed over the last few years with the development of the internet.
Optical fiber communication became an economic reality in the early 1970s when absorption of less than 20 dB /km was achieved in optical fibers and lifetimes of more than 1 million hours for semiconductor lasers were accomplished.
Underlying many of the current mathematical opportunities in digital signal processing are unsolved analog signal processing problems. For instance, digital signals for communication or sensing must map into an analog format for transmission through a physical layer. In this layer we meet a canonical example of analog signal processing: the electrical engineer's impedance matching problem. Impedance matching is the design of analog signal processing circuits to minimize loss and distortion as the signal moves from its source into the propagation medium. This paper works the matching problem from theory to sampled data, exploiting links between H∞ theory, hyperbolic geometry, and matching circuits. We apply J. W. Helton's significant extensions of operator theory, convex analysis, and optimization theory to demonstrate new approaches and research opportunities in this fundamental problem.
1. The Impedance Matching Problem
Figure 1 shows a twin-whip HF (high-frequency) antenna mounted on a superstructure representative of a shipboard environment. If a signal generator is connected directly to this antenna, not all the power delivered to the antenna can be radiated by the antenna. If an impedance mismatch exists between the signal generator and the antenna, some of the signal power is reflected from the antenna back to the generator. To effectively use this antenna, a matching circuit must be inserted between the signal generator and antenna to minimize this wasted power.
Figure 2 shows the matching circuit connecting the generator to the antenna. Port 1 is the input from the generator. Port 2 is the output that feeds the antenna.
Three-dimensional volumetric data are becoming increasingly available in a wide range of scientific and technical disciplines. With the right tools, we can expect such data to yield valuable insights about many important phenomena in our three-dimensional world.
In this paper, we develop tools for the analysis of 3-D data which may contain structures built from lines, line segments, and filaments. These tools come in two main forms: (a) Monoscale: the X-ray transform, offering the collection of line integrals along a wide range of lines running through the image-at all different orientations and positions; and (b) Multiscale: the (3-D) beamlet transform, offering the collection of line integrals along line segments which, in addition to ranging through a wide collection of locations and positions, also occupy a wide range of scales.
We describe different strategies for computing these transforms and several basic applications, for example in finding faint structures buried in noisy data.
1. Introduction
In field after field, we are currently seeing new initiatives aimed at gathering large high-resolution three-dimensional datasets. While three-dimensional data have always been crucial to understanding the physical world we live in, this transition to ubiquitous 3-D data gathering seems novel. The driving force is undoubtedly the pervasive influence of increasing storage capacity and computer processing power, which affects our ability to create new 3-D measurement instruments, but which also makes it possible to analyze the massive volumes of data that inevitably result when 3-D data are being gathered.
In this chapter we investigate ICA models in which the number of sources, M, may be less than the number of sensors, N: so-called non-square mixing.
The ‘extra’ sensor observations are explained as observation noise. This general approach may be called Probabilistic Independent Component Analysis (PICA) by analogy with the Probabilistic Principal Component Analysis (PPCA) model of Tipping & Bishop [I9971; ICA and PCA don't have observation noise, PICA and PPCA do.
Non-square ICA models give rise to a likelihood model for the data involving an integral which is intractable. In this chapter we build on previous work in which the integral is estimated using a Laplace approximation. By making the further assumption that the unmixing matrix lies on the decorrelating manifold we are able to make a number of simplifications. Firstly, the observation noise can be estimated using PCA methods, and, secondly, optimisation takes place in a space having a much reduced dimensionality, having order M2 parameters rather than M × N. Again, building on previous work, we derive a model order selection criterion for selecting the appropriate number of sources. This is based on the Laplace approximation as applied to the decorrelating manifold. This is then compared with PCA model order selection methods on music and EEG datasets.
Non-Gaussianity is of paramount importance in ICA estimation. Without non-Gaussianity the estimation is not possible at all (unless the independent components have time-dependences). Therefore, it is not surprising that non-Gaussianity could be used as a leading principle in ICA estimation.
In this chapter, we derive a simple principle of ICA estimation: the independent components can be found as the projections that maximize non-Gaussianity. In addition to its intuitive appeal, this approach allows us to derive a highly efficient ICA algorithm, Fast ICA. This is a fixed-point algorithm that can be used for estimating the independent components one by one. At the end of the chapter, it will be seen that it is closely connected to maximum likelihood or infomax estimation as well.
Whitening
First, let us consider preprocessing techniques that are essential if we want to develop fast ICA methods.
The rather trivial preprocessing that is used in many cases is to centre x, i.e. subtract its mean vector m = ε{x} so as to make x a zero-mean variable. This implies that s is zero-mean as well. This preprocessing is made solely to simplify the ICA algorithms: it does not mean that the mean could not be estimated. After estimating the mixing matrix A with centred data, we can complete the estimation by adding the mean vector of s back to the centred estimates of s.
An unsupervised classification algorithm is derived by modelling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities. The algorithm estimates the density of each class and is able to model class distributions with non-Gaussian structure. It can improve classification accuracy compared with standard Gaussian mixture models. When applied to images, the algorithm can learn efficient codes (basis functions) for images that capture the statistical structure of the images. We applied this method to the problem of unsupervised classification, segmentation and de-noising of images. This method was effective in classifying complex image textures such as trees and rocks in natural scenes. It was also useful for de-noising and filling in missing pixels in images with complex structures. The advantage of this model is that image codes can be learned with increasing numbers of classes thus providing greater flexibility in modelling structure and in finding more image features than in either Gaussian mixture models or standard ICA algorithms.
Introduction
Recently, Blind Source Separation by Independent Component Analysis has been applied to signal processing problems including speech enhancement, telecommunications and medical signal processing. ICA finds a linear non-orthogonal coordinate system in multivariate data determined by second- and higher-order statistics. The goal of ICA is to linearly transform the data in such a way that the transformed variables are as statistically independent from each other as possible [Jutten & Herault, 1991, Comon, 1994, Bell & Sejnowski, 1995, Cardoso & Laheld, 1996, Lee et al., 2000b]. ICA generalizes the technique of Principal Component Analysis (PCA) and, like PCA, has proven a useful tool for finding structure in data.
Independent Component Analysis (ICA), as a signal processing tool, has shown great promise in many application domains, two of the most successful being telecommunications and biomedical engineering. With the growing awareness of ICA many other less obvious applications of the transform are starting to appear, for example financial time series prediction [Back & Weigend, 19981 and information retrieval [Isbell & Viola, 1999, Girolami, 2000a, Vinokourov & Girolami, 20001. In such applications ICA is being used as an unsupervised means of exploring and, hopefully, uncovering meaningful latent traits or structure within the data. In the case of financial time series prediction the factors which drive the evolution of the time series are hopefully uncovered, whereas within information retrieval the latent concepts or topics which generate key-words that occur in the documents are sought after. The strong assumption of independence of the hidden factors in ICA is difficult to argue for when there is limited a priori knowledge of the data, indeed it is desired that the analysis uncover injormative components which may, or may not, be independent. Nevertheless the independence assumption allows analytically tractable statistical models to be developed.
This chapter will consider how the standard ICA model can be extended and used in the unsupervised classification and visualisation of multivariate data. Prior to the formal presentation, to set the context of the remainder of the chapter, a short review of the ICA signal model and the corresponding statistical representation is given. The remaining sections propose ICA inspired techniques for the unsupervised classification and visualisation of multivariate data.
In recent years there has been an explosion of interest in the application and theory of independent component analysis (ICA). This book is aimed to provide a self-contained introduction to the subject as well as offering a set of invited contributions which we see as lying at the cutting edge of ICA research.
ICA is intimately linked with the problem of blind source separation-attempting to recover a set of underlying sources when only a noisy mapping from these sources, the observations, is given-and we regard this as the canonical form of ICA. Until recently this mapping was taken to be linear (but see Chapter 4) and “traditionally” (if tradition is allowed in a field of such recent developments) noiseless with the number of observations being equal to the number of hypothesised sources. It is surprising that even the simplest of ICA models can be invaluable and offer new insights into data analysis and interpretation. This, at first sight unreasonable, claim may be supported by noting that many observations of physical systems are produced by a linear combination of underlying sources. Furthermore, in many applications, it is an end in itself to produce a set of “sources” which are statistically independent rather than just decorrelated (see Chapter 1) and for this ICA would appear an ideal tool.
One may think of blind source separation as the problem of identifying speakers (sources) in a room given only recordings from a number of microphones, each of which records a linear mixture of the sources, whose statistical characteristics are unknown.
Here we consider the blind source separation problem when the mixing of the sources is non-stationary. Pursuing the speakers in a room analogy, we address the problem of identifying the speakers when they (or equivalently, the microphones) are moving. The problem is cast in terms of a hidden state (the mixing proportions of the sources) which we track using particle filter methods, which permit the tracking of arbitrary state densities. Murata et al. [I9971 have addressed this problem by adapting the learning rate and we mention work by Penny et al. [2000] on hidden Markov models for ICA which allows for abrupt changes in the mixing matrix with stationary periods in between.
We first briefly re-review classical Independent Component Analysis. ICA with non-stationary mixing is described in terms of a hidden state model and methods for estimating the sources and the mixing are described. Particle filter techniques are then introduced for the modelling of state densities. Finally, we address the non-stationary mixing problem when the sources are independent, but possess temporal correlations.
This chapter deals with independent component analysis and blind source separation for nonlinear data models. A fundamental difficulty, especially in the nonlinear ICA problem, is that it is highly non-unique without a suitable regularization. After considering this, two methods for solving the nonlinear ICA and BSS problems are presented in more detail. The first one is a maximum likelihood method based on a modified generative topographic mapping. The second approach applies Bayesian ensemble learning to a flexible multi-layer perceptron model for finding the sources and nonlinear mixing mapping that have most probably given rise to the observed mixed data. Finally, other techniques introduced for the nonlinear ICA and BSS problems are briefly reviewed.
Introduction
Independent Component Analysis [Lee, 1998, Oja et al., 1997, Girolami, 1999bl is a statistical technique which tries to represent the observed data in terms of statistically independent component variables. ICA is closely related to the blind source separation (BSS) problem [Cardoso, 1998a, Amari et al., 1996, Lee, 1998, Oja et al., 1997, Girolami, 1999b1, where the general goal is to separate mutually independent but otherwise unknown source signals from their observed mixtures without knowing the mixing process.
Independent Component Analysis (ICA) has recently become an important tool for modelling and understanding empirical datasets as it offers an elegant and practical methodology for blind source separation and deconvolution. It is seldom possible to observe a pure unadulterated signal. Instead most observations consist of a mixture of signals usually corrupted by noise, and frequently filtered. The signal processing community has devoted much attention to the problem of recovering the constituent sources from the convolutive mixture; ICA may be applied to this Blind Source Separation (BSS) problem to recover the sources. As the appellation independent suggests, recovery relies on the assumption that the constituent sources are mutually independent.
Finding a natural coordinate system is an essential first step in the analysis of empirical data. Principal component analysis (PCA) has, for many years, been used to find a set of basis vectors which are determined by the dataset itself. The principal components are orthogonal and projections of the data onto them are linearly decorrelated, properties which can be ensured by considering only the second order statistical characteristics of the data. ICA aims at a loftier goal: it seeks a transformation to coordinates in which the data are maximally statistically independent, not merely decorrelated.