To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
‘Which road do I take?’ Alice asked. ‘Where do you want to go?’ responded the Cheshire Cat. ‘I don't know,’ Alice answered. ‘Then,’ said the Cat, ‘it doesn't matter’ (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
Its name, Projection Pursuit, highlights a key aspect of the method: the search for projections worth pursuing. Projection Pursuit can be regarded as embracing the classical multivariate methods while at the same time striving to find something ‘interesting’. This invites the question of what we call interesting. For scores in mathematics, language and literature, and comprehensive tests that psychologists, for example, use to find a person's hidden indicators of intelligence, we could attempt to find as many indicators as possible, or one could try to find the most interesting or most informative indicator. In Independent Component Analysis, one attempts to find all indicators, whereas Projection Pursuit typically searches for the most interesting one.
In Principal Component Analysis, the directions or projections of interest are those which capture the variability in the data. The stress and strain criteria in Multidimensional Scaling variously broaden this set of directions. Of a different nature are the directions of interest in Canonical Correlation Analysis: they focus on the strength of the correlation between different parts of the data. Projection Pursuit covers a rich set of directions and includes those of the classical methods. The directions of interest in Principal Component Analysis, the eigen vectors of the covariance matrix, are obtained by solving linear algebraic equations.
Get your facts first, and then you can distort them as much as you please (Mark Twain, 1835–1910).
Introduction
The first part of this book dealt with the three classical problems: finding structure within data, determining relationships between different subsets of variables and dividing data into classes. In Part II, we focus on the first of the problems, finding structure – in particular, groups or factors – in data. The three methods we explore, Cluster Analysis, Factor Analysis and Multidimensional Scaling, are classical in their origin and were developed initially in the behavioural sciences. They have since become indispensable tools in diverse areas including psychology, psychiatry, biology, medicine and marketing, as well as having become mainstream statistical techniques. We will see that Principal Component Analysis plays an important role in these methods as a preliminary step in the analysis or as a special case within a broader framework.
Cluster Analysis is similar to Discriminant Analysis in that one attempts to partition the data into groups. In biology, one might want to determine specific cell subpopulations. In archeology, researchers have attempted to establish taxonomies of stone tools or funeral objects by applying cluster analytic techniques. Unlike Discriminant Analysis, however, we do not know the class membership of any of the observations. The emphasis in Factor Analysis and Multidimensional Scaling is on the interpretability of the data in terms of a small number of meaningful descriptors or dimensions.
Alles Gescheite ist schon gedacht worden, man muß nur versuchen, es noch einmal zu denken (Johann Wolfgang von Goethe, Wilhelm Meisters Wanderjahre, 1749–1832.) Every clever thought has been thought before, we can only try to recreate these thoughts.
Introduction
In Chapter 2 we represented a random vector as a linear combination of uncorrelated vectors. From one random vector we progress to two vectors, but now we look for correlation between the variables of the first and second vectors, and in particular, we want to find out which variables are correlated and how strong this relationship is.
In medical diagnostics, for example, we may meet multivariate measurements obtained from tissue and plasma samples of patients, and the tissue and plasma variables typically differ. A natural question is: What is the relationship between the tissue measurements and the plasma measurements? A strong relationship between a combination of tissue variables and a combination of plasma variables typically indicates that either set of measurements could be used for a particular diagnosis. A very weak relationship between the plasma and tissue variables tells us that the sets of variables are not equally appropriate for a particular diagnosis.
On the share market, one might want to compare changes in the price of industrial shares and mining shares over a period of time. The time points are the observations, and for each time point, we have two sets of variables: those arising from industrial shares and those arising from mining shares.
‘That's not a regular rule: you invented it just now.’ ‘It's the oldest rule in the book,’ said the King. ‘Then it ought to be Number One,’ said Alice (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
To discriminate means to single out, to recognise and understand differences and to distinguish. Of special interest is discrimination in two-class problems: A tumour is benign or malignant, and the correct diagnosis needs to be obtained. In the finance and credit-risk area, one wants to assess whether a company is likely to go bankrupt in the next few years or whether a client will default on mortgage repayments. To be able to make decisions in these situations, one needs to understand what distinguishes a ‘good’ client from one who is likely to default or go bankrupt.
Discriminant Analysis starts with data for which the classes are known and finds characteristics of the observations that accurately predict each observation's class. One then combines this information into a rule which leads to a partitioning of the observations into disjoint classes. When using Discriminant Analysis for tumour diagnosis, for example, the first step is to determine the variables which best characterise the difference between the benign and malignant groups – based on data for tumours whose status (benign or malignant) is known – and to construct a decision rule based on these variables.
This book is about data in many – and sometimes very many – variables and about analysing such data. The book attempts to integrate classical multivariate methods with contemporary methods suitable for high-dimensional data and to present them in a coherent and transparent framework. Writing about ideas that emerged more than a hundred years ago and that have become increasingly relevant again in the last few decades is exciting and challenging. With hindsight, we can reflect on the achievements of those who paved the way, whose methods we apply to ever bigger and more complex data and who will continue to influence our ideas and guide our research. Renewed interest in the classical methods and their extension has led to analyses that give new insight into data and apply to bigger and more complex problems.
There are two players in this book: Theory and Data. Theory advertises its wares to lure Data into revealing its secrets, but Data has its own ideas. Theory wants to provide elegant solutions which answer many but not all of Data's demands, but these lead Data to pose new challenges to Theory. Statistics thrives on interactions between theory and data, and we develop better theory when we ‘listen’ to data. Statisticians often work with experts in other fields and analyse data from many different areas. We, the statisticians, need and benefit from the expertise of our colleagues in the analysis of their data and interpretation of the results of our analysis.
I am not bound to please thee with my answer (William Shakespeare, The Merchant of Venice, 1596–1598).
Introduction
It is not always possible to measure the quantities of interest directly. In psychology, intelligence is a prime example; scores in mathematics, language and literature, or comprehensive tests are used to describe a person's intelligence. From these measurements, a psychologist may want to derive a person's intelligence. Behavioural scientist Charles Spearman is credited with being the originator and pioneer of the classical theory of mental tests, the theory of intelligence and what is now called Factor Analysis. In 1904, Spearman proposed a two-factor theory of intelligence which he extended over a number of decades (see Williams et al., 2003). Since its early days, Factor Analysis has enjoyed great popularity and has become a valuable tool in the analysis of complex data in areas as diverse as behavioural sciences, health sciences and marketing. The appeal of Factor Analysis lies in the ease of use and the recognition that there is an association between the hidden quantities and the measured quantities.
The aim of Factor Analysis is
• to exhibit the relationship between the measured and the underlying variables, and
• to estimate the underlying variables, called the hidden or latent variables.
Although many of the key developments have arisen in the behavioural sciences, Factor Analysis has an important place in statistics. Its model-based nature has invited, and resulted in, many theoretical and statistical advances.
Den Samen legen wir in ihre Hände! Ob Glück, ob Unglück aufgeht, lehrt das Ende (Friedrich von Schiller, Wallensteins Tod, 1799). We put the seed in your hands! Whether it develops into fortune or mistfortune only the end can teach us.
Introduction
In the beginning – in 1901 – there was Principal Component Analysis. On our journey through this book we have encountered many different methods for analysing multidimensional data, and many times on this journey, Principal Component Analysis reared its – some might say, ugly – head. About a hundred years since its birth, a renaissance of Principal Component Analysis (PCA) has led to new theoretical and practical advances for high-dimensional data and to SPCA, where S variously refers to simple, supervised and sparse. It seems appropriate, at the end of our journey, to return to where we started and take a fresh look at developments which have revitalised Principal Component Analysis. These include the availability of high-dimensional and functional data and the necessity for dimension reduction and feature selection and new and sparse ways of representing data.
Exciting developments in the analysis of high-dimensional data have been interacting with similar ones in Statistical Learning. It is not clear where analysis of data stops and learning from data starts. An essential part of both is the selection of ‘important’ and ‘relevant’ features or variables.
The term optimal filtering traditionally refers to a class of methods that can be used for estimating the state of a time-varying system which is indirectly observed through noisy measurements. The term optimal in this context refers to statistical optimality. Bayesian filtering refers to the Bayesian way of formulating optimal filtering. In this book we use these terms interchangeably and always mean Bayesian filtering.
In optimal, Bayesian, and Bayesian optimal filtering the state of the system refers to the collection of dynamic variables such as position, velocity, orientation, and angular velocity, which fully describe the system. The noise in the measurements means that they are uncertain; even if we knew the true system state the measurements would not be deterministic functions of the state, but would have a distribution of possible values. The time evolution of the state is modeled as a dynamic system which is perturbed by a certain process noise. This noise is used for modeling the uncertainties in the system dynamics. In most cases the system is not truly stochastic, but stochasticity is used for representing the model uncertainties.
Bayesian smoothing (or optimal smoothing) is often considered to be a class of methods within the field of Bayesian filtering. While Bayesian filters in their basic form only compute estimates of the current state of the system given the history of measurements, Bayesian smoothers can be used to reconstruct states that happened before the current time.