To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Denken ist interessanter als Wissen, aber nicht als Anschauen (Johann Wolfgang von Goethe, Werke – Hamburger Ausgabe Bd. 12, Maximen und Reflexionen, 1749–1832). Thinking is more interesting than knowing, but not more interesting than looking at.
Multivariate and High-Dimensional Problems
Early in the twentieth century, scientists suchas Pearson (1901), Hotelling (1933) and Fisher (1936) developed methods for analysing multivariate data in order to
• understand the structure in the data and summarise it in simpler ways;
• understand the relationship of one part of the data to another part; and
• make decisions and inferences based on the data.
The early methods these scientists developed are linear; their conceptual simplicity and elegance still strike us today as natural and surprisingly powerful. Principal Component Analysis deals with the first topic in the preceding list, Canonical Correlation Analysis with the second and Discriminant Analysis with the third. As time moved on, more complex methods were developed, often arising in areas such as psychology, biology or economics, but these linear methods have not lost their appeal. Indeed, as we have become more able to collect and handle very large and high-dimensional data, renewed requirements for linear methods have arisen. In these data sets essential structure can often be obscured by noise, and it becomes vital to
reduce the original data in such a way that informative and interesting structure in the data is preserved while noisy, irrelevant or purely random variables, dimensions or features are removed, as these can adversely affect the analysis.
There is no sense in being precise when you don't even know what you're talking about (John von Neumann, 1903–1957).
Introduction
Cluster Analysis is an exploratory technique which partitions observations into different clusters or groupings. In medicine, biology, psychology, marketing or finance, multivariate measurements of objects or individuals are the data of interest. In biology, human blood cells of one or more individuals – such as the HIV flow cytometry data – might be the objects one wants to analyse. Cells with similar multivariate responses are grouped together, and cells whose responses differ considerably from each other are partitioned into different clusters. The analysis of cells from a number of individuals such as HIV+ and HIV− individuals may result in different cluster patterns. These differences are informative for the biologist and might allow him or her to draw conclusions about the onset or progression of a disease or a patient's response to treatment.
Clustering techniques are applicable whenever a mountain of data needs to be grouped into manageable and meaningful piles. In some applications we know that the data naturally fall into two groups, such as HIV+ or HIV−, but in many cases the number of clusters is not known. The goal of Cluster Analysis is to determine
• the cluster allocation for each observation, and
• the number of clusters.
For some clustering methods – such as k-means – the user has to specify the number of clusters prior to applying the method.
Mathematics, rightly viewed, possesses not only truth, but supreme beauty (Bertrand Russell, Philosophical Essays No. 4, 1910).
Introduction
One of the aims in multivariate data analysis is to summarise the data in fewer than the original number of dimensions without losing essential information. More than a century ago, Pearson (1901) considered this problem, and Hotelling (1933) proposed a solution to it: instead of treating each variable separately, he considered combinations of the variables. Clearly, the average of all variables is such a combination, but many others exist. Two fundamental questions arise:
How should one choose these combinations?
How many such combinations should one choose?
There is no single strategy that always gives the right answer. This book will describe many ways of tackling at least the first problem.
Hotelling's proposal consisted in finding those linear combinations of the variables which best explain the variability of the data. Linear combinations are relatively easy to compute and interpret. Also, linear combinations have nice mathematical properties. Later methods, such as Multidimensional Scaling, broaden the types of combinations, but this is done at a cost: The mathematical treatment becomes more difficult, and the practical calculations will be more complex. The complexity increases with the size of the data, and it is one of the major reasons why Multidimensional Scaling has taken rather longer to regain popularity.
The second question is of a different nature, and its answer depends on the solution to the first.
As we know, there are known knowns; there are things we know we know. We also know, there are known unknowns, that is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we do not know we do not know (Donald Rumsfeld, Department of Defense news briefing, 12 February 2002).
Introduction
The classical or pre-2000 developments in Independent Component Analysis focus on approximating the mutual information by cumulants or moments, and they pursue the relationship between independence and non-Gaussianity. The theoretical framework of these early independent component approaches is accompanied by efficient software, and the FastICA solutions, in particular, have resulted in these approaches being recognised as among the main tools for calculating independent and non-Gaussian directions. The computational ease of FastICA solutions, however, does not detract from the development of other methods that find non-Gaussian or independent components. Indeed, the search for new ways of determining independent components has remained an active area of research.
This chapter looks at a variety of approaches which address the independent component problem. It is impossible to do justice to this fast-growing body of research; I aim to give a flavour of the diversity of approaches by introducing the reader to a number of contrasting methods. The methods I describe are based on a theoretical framework, but this does not imply that heuristically based approaches are not worth considering.
‘Which road do I take?’ Alice asked. ‘Where do you want to go?’ responded the Cheshire Cat. ‘I don't know,’ Alice answered. ‘Then,’ said the Cat, ‘it doesn't matter’ (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
Its name, Projection Pursuit, highlights a key aspect of the method: the search for projections worth pursuing. Projection Pursuit can be regarded as embracing the classical multivariate methods while at the same time striving to find something ‘interesting’. This invites the question of what we call interesting. For scores in mathematics, language and literature, and comprehensive tests that psychologists, for example, use to find a person's hidden indicators of intelligence, we could attempt to find as many indicators as possible, or one could try to find the most interesting or most informative indicator. In Independent Component Analysis, one attempts to find all indicators, whereas Projection Pursuit typically searches for the most interesting one.
In Principal Component Analysis, the directions or projections of interest are those which capture the variability in the data. The stress and strain criteria in Multidimensional Scaling variously broaden this set of directions. Of a different nature are the directions of interest in Canonical Correlation Analysis: they focus on the strength of the correlation between different parts of the data. Projection Pursuit covers a rich set of directions and includes those of the classical methods. The directions of interest in Principal Component Analysis, the eigen vectors of the covariance matrix, are obtained by solving linear algebraic equations.
Get your facts first, and then you can distort them as much as you please (Mark Twain, 1835–1910).
Introduction
The first part of this book dealt with the three classical problems: finding structure within data, determining relationships between different subsets of variables and dividing data into classes. In Part II, we focus on the first of the problems, finding structure – in particular, groups or factors – in data. The three methods we explore, Cluster Analysis, Factor Analysis and Multidimensional Scaling, are classical in their origin and were developed initially in the behavioural sciences. They have since become indispensable tools in diverse areas including psychology, psychiatry, biology, medicine and marketing, as well as having become mainstream statistical techniques. We will see that Principal Component Analysis plays an important role in these methods as a preliminary step in the analysis or as a special case within a broader framework.
Cluster Analysis is similar to Discriminant Analysis in that one attempts to partition the data into groups. In biology, one might want to determine specific cell subpopulations. In archeology, researchers have attempted to establish taxonomies of stone tools or funeral objects by applying cluster analytic techniques. Unlike Discriminant Analysis, however, we do not know the class membership of any of the observations. The emphasis in Factor Analysis and Multidimensional Scaling is on the interpretability of the data in terms of a small number of meaningful descriptors or dimensions.
Alles Gescheite ist schon gedacht worden, man muß nur versuchen, es noch einmal zu denken (Johann Wolfgang von Goethe, Wilhelm Meisters Wanderjahre, 1749–1832.) Every clever thought has been thought before, we can only try to recreate these thoughts.
Introduction
In Chapter 2 we represented a random vector as a linear combination of uncorrelated vectors. From one random vector we progress to two vectors, but now we look for correlation between the variables of the first and second vectors, and in particular, we want to find out which variables are correlated and how strong this relationship is.
In medical diagnostics, for example, we may meet multivariate measurements obtained from tissue and plasma samples of patients, and the tissue and plasma variables typically differ. A natural question is: What is the relationship between the tissue measurements and the plasma measurements? A strong relationship between a combination of tissue variables and a combination of plasma variables typically indicates that either set of measurements could be used for a particular diagnosis. A very weak relationship between the plasma and tissue variables tells us that the sets of variables are not equally appropriate for a particular diagnosis.
On the share market, one might want to compare changes in the price of industrial shares and mining shares over a period of time. The time points are the observations, and for each time point, we have two sets of variables: those arising from industrial shares and those arising from mining shares.
‘That's not a regular rule: you invented it just now.’ ‘It's the oldest rule in the book,’ said the King. ‘Then it ought to be Number One,’ said Alice (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
To discriminate means to single out, to recognise and understand differences and to distinguish. Of special interest is discrimination in two-class problems: A tumour is benign or malignant, and the correct diagnosis needs to be obtained. In the finance and credit-risk area, one wants to assess whether a company is likely to go bankrupt in the next few years or whether a client will default on mortgage repayments. To be able to make decisions in these situations, one needs to understand what distinguishes a ‘good’ client from one who is likely to default or go bankrupt.
Discriminant Analysis starts with data for which the classes are known and finds characteristics of the observations that accurately predict each observation's class. One then combines this information into a rule which leads to a partitioning of the observations into disjoint classes. When using Discriminant Analysis for tumour diagnosis, for example, the first step is to determine the variables which best characterise the difference between the benign and malignant groups – based on data for tumours whose status (benign or malignant) is known – and to construct a decision rule based on these variables.