Search results for Statistical theory and methods

References
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 483-492
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Multidimensional Data
from I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 3-17
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Denken ist interessanter als Wissen, aber nicht als Anschauen (Johann Wolfgang von Goethe, Werke – Hamburger Ausgabe Bd. 12, Maximen und Reflexionen, 1749–1832). Thinking is more interesting than knowing, but not more interesting than looking at.
Multivariate and High-Dimensional Problems
Early in the twentieth century, scientists suchas Pearson (1901), Hotelling (1933) and Fisher (1936) developed methods for analysing multivariate data in order to
• understand the structure in the data and summarise it in simpler ways;
• understand the relationship of one part of the data to another part; and
• make decisions and inferences based on the data.
The early methods these scientists developed are linear; their conceptual simplicity and elegance still strike us today as natural and surprisingly powerful. Principal Component Analysis deals with the first topic in the preceding list, Canonical Correlation Analysis with the second and Discriminant Analysis with the third. As time moved on, more complex methods were developed, often arising in areas such as psychology, biology or economics, but these linear methods have not lost their appeal. Indeed, as we have become more able to collect and handle very large and high-dimensional data, renewed requirements for linear methods have arisen. In these data sets essential structure can often be obscured by noise, and it becomes vital to
reduce the original data in such a way that informative and interesting structure in the data is preserved while noisy, irrelevant or purely random variables, dimensions or features are removed, as these can adversely affect the analysis.

6 - Cluster Analysis
from II - Factors and Groupings
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 183-222
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

There is no sense in being precise when you don't even know what you're talking about (John von Neumann, 1903–1957).
Introduction
Cluster Analysis is an exploratory technique which partitions observations into different clusters or groupings. In medicine, biology, psychology, marketing or finance, multivariate measurements of objects or individuals are the data of interest. In biology, human blood cells of one or more individuals – such as the HIV flow cytometry data – might be the objects one wants to analyse. Cells with similar multivariate responses are grouped together, and cells whose responses differ considerably from each other are partitioned into different clusters. The analysis of cells from a number of individuals such as HIV+ and HIV− individuals may result in different cluster patterns. These differences are informative for the biologist and might allow him or her to draw conclusions about the onset or progression of a disease or a patient's response to treatment.
Clustering techniques are applicable whenever a mountain of data needs to be grouped into manageable and meaningful piles. In some applications we know that the data naturally fall into two groups, such as HIV+ or HIV−, but in many cases the number of clusters is not known. The goal of Cluster Analysis is to determine
• the cluster allocation for each observation, and
• the number of clusters.
For some clustering methods – such as k-means – the user has to specify the number of clusters prior to applying the method.

III - Non-Gaussian Analysis
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 293-294
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Problems for Part III
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 476-482
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp vii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

List of Algorithms
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp xiii-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Principal Component Analysis
from I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 18-69
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Mathematics, rightly viewed, possesses not only truth, but supreme beauty (Bertrand Russell, Philosophical Essays No. 4, 1910).
Introduction
One of the aims in multivariate data analysis is to summarise the data in fewer than the original number of dimensions without losing essential information. More than a century ago, Pearson (1901) considered this problem, and Hotelling (1933) proposed a solution to it: instead of treating each variable separately, he considered combinations of the variables. Clearly, the average of all variables is such a combination, but many others exist. Two fundamental questions arise:
How should one choose these combinations?
How many such combinations should one choose?
There is no single strategy that always gives the right answer. This book will describe many ways of tackling at least the first problem.
Hotelling's proposal consisted in finding those linear combinations of the variables which best explain the variability of the data. Linear combinations are relatively easy to compute and interpret. Also, linear combinations have nice mathematical properties. Later methods, such as Multidimensional Scaling, broaden the types of combinations, but this is done at a cost: The mathematical treatment becomes more difficult, and the practical calculations will be more complex. The complexity increases with the size of the data, and it is one of the major reasons why Multidimensional Scaling has taken rather longer to regain popularity.
The second question is of a different nature, and its answer depends on the solution to the first.

12 - Kernel and More Independent Component Methods
from III - Non-Gaussian Analysis
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 381-420
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As we know, there are known knowns; there are things we know we know. We also know, there are known unknowns, that is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we do not know we do not know (Donald Rumsfeld, Department of Defense news briefing, 12 February 2002).
Introduction
The classical or pre-2000 developments in Independent Component Analysis focus on approximating the mutual information by cumulants or moments, and they pursue the relationship between independence and non-Gaussianity. The theoretical framework of these early independent component approaches is accompanied by efficient software, and the FastICA solutions, in particular, have resulted in these approaches being recognised as among the main tools for calculating independent and non-Gaussian directions. The computational ease of FastICA solutions, however, does not detract from the development of other methods that find non-Gaussian or independent components. Indeed, the search for new ways of determining independent components has remained an active area of research.
This chapter looks at a variety of approaches which address the independent component problem. It is impossible to do justice to this fast-growing body of research; I aim to give a flavour of the diversity of approaches by introducing the reader to a number of contrasting methods. The methods I describe are based on a theoretical framework, but this does not imply that heuristically based approaches are not worth considering.

Problems for Part II
from II - Factors and Groupings
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 286-292
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Data Index
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 503-504
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Author Index
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 493-497
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Problems for Part I
from I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 165-172
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 1-2
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Projection Pursuit
from III - Non-Gaussian Analysis
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 349-380
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

‘Which road do I take?’ Alice asked. ‘Where do you want to go?’ responded the Cheshire Cat. ‘I don't know,’ Alice answered. ‘Then,’ said the Cat, ‘it doesn't matter’ (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
Its name, Projection Pursuit, highlights a key aspect of the method: the search for projections worth pursuing. Projection Pursuit can be regarded as embracing the classical multivariate methods while at the same time striving to find something ‘interesting’. This invites the question of what we call interesting. For scores in mathematics, language and literature, and comprehensive tests that psychologists, for example, use to find a person's hidden indicators of intelligence, we could attempt to find as many indicators as possible, or one could try to find the most interesting or most informative indicator. In Independent Component Analysis, one attempts to find all indicators, whereas Projection Pursuit typically searches for the most interesting one.
In Principal Component Analysis, the directions or projections of interest are those which capture the variability in the data. The stress and strain criteria in Multidimensional Scaling variously broaden this set of directions. Of a different nature are the directions of interest in Canonical Correlation Analysis: they focus on the strength of the correlation between different parts of the data. Projection Pursuit covers a rich set of directions and includes those of the classical methods. The directions of interest in Principal Component Analysis, the eigen vectors of the covariance matrix, are obtained by solving linear algebraic equations.

Notation
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp xv-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Norms, Proximities, Features and Dualities
from II - Factors and Groupings
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 175-182
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Get your facts first, and then you can distort them as much as you please (Mark Twain, 1835–1910).
Introduction
The first part of this book dealt with the three classical problems: finding structure within data, determining relationships between different subsets of variables and dividing data into classes. In Part II, we focus on the first of the problems, finding structure – in particular, groups or factors – in data. The three methods we explore, Cluster Analysis, Factor Analysis and Multidimensional Scaling, are classical in their origin and were developed initially in the behavioural sciences. They have since become indispensable tools in diverse areas including psychology, psychiatry, biology, medicine and marketing, as well as having become mainstream statistical techniques. We will see that Principal Component Analysis plays an important role in these methods as a preliminary step in the analysis or as a special case within a broader framework.
Cluster Analysis is similar to Discriminant Analysis in that one attempts to partition the data into groups. In biology, one might want to determine specific cell subpopulations. In archeology, researchers have attempted to establish taxonomies of stone tools or funeral objects by applying cluster analytic techniques. Unlike Discriminant Analysis, however, we do not know the class membership of any of the observations. The emphasis in Factor Analysis and Multidimensional Scaling is on the interpretability of the data in terms of a small number of meaningful descriptors or dimensions.

Dedication
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp v-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Canonical Correlation Analysis
from I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 70-115
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Alles Gescheite ist schon gedacht worden, man muß nur versuchen, es noch einmal zu denken (Johann Wolfgang von Goethe, Wilhelm Meisters Wanderjahre, 1749–1832.) Every clever thought has been thought before, we can only try to recreate these thoughts.
Introduction
In Chapter 2 we represented a random vector as a linear combination of uncorrelated vectors. From one random vector we progress to two vectors, but now we look for correlation between the variables of the first and second vectors, and in particular, we want to find out which variables are correlated and how strong this relationship is.
In medical diagnostics, for example, we may meet multivariate measurements obtained from tissue and plasma samples of patients, and the tissue and plasma variables typically differ. A natural question is: What is the relationship between the tissue measurements and the plasma measurements? A strong relationship between a combination of tissue variables and a combination of plasma variables typically indicates that either set of measurements could be used for a particular diagnosis. A very weak relationship between the plasma and tissue variables tells us that the sets of variables are not equally appropriate for a particular diagnosis.
On the share market, one might want to compare changes in the price of industrial shares and mining shares over a period of time. The time points are the observations, and for each time point, we have two sets of variables: those arising from industrial shares and those arising from mining shares.

4 - Discriminant Analysis
from I - Classical Methods
Inge Koch, University of Adelaide
Book:

Analysis of Multivariate and High-Dimensional Data

Published online:

05 June 2014

Print publication:

02 December 2013, pp 116-164
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

‘That's not a regular rule: you invented it just now.’ ‘It's the oldest rule in the book,’ said the King. ‘Then it ought to be Number One,’ said Alice (Lewis Carroll, Alice's Adventures in Wonderland, 1865).
Introduction
To discriminate means to single out, to recognise and understand differences and to distinguish. Of special interest is discrimination in two-class problems: A tumour is benign or malignant, and the correct diagnosis needs to be obtained. In the finance and credit-risk area, one wants to assess whether a company is likely to go bankrupt in the next few years or whether a client will default on mortgage repayments. To be able to make decisions in these situations, one needs to understand what distinguishes a ‘good’ client from one who is likely to default or go bankrupt.
Discriminant Analysis starts with data for which the classes are known and finds characteristics of the observations that accurately predict each observation's class. One then combines this information into a rule which leads to a partitioning of the observations into disjoint classes. When using Discriminant Analysis for tumour diagnosis, for example, the first step is to determine the variables which best characterise the difference between the benign and malignant groups – based on data for tumours whose status (benign or malignant) is known – and to construct a decision rule based on these variables.

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

2326 results in Statistical theory and methods

References

1 - Multidimensional Data

Summary

6 - Cluster Analysis

Summary

III - Non-Gaussian Analysis

Problems for Part III

Contents

List of Algorithms

2 - Principal Component Analysis

Summary

12 - Kernel and More Independent Component Methods

Summary

Problems for Part II

Data Index

Author Index

Problems for Part I

I - Classical Methods

11 - Projection Pursuit

Summary

Notation

5 - Norms, Proximities, Features and Dualities

Summary

Dedication

3 - Canonical Correlation Analysis

Summary

4 - Discriminant Analysis

Summary

Statistical theory and methods

Refine search

Refine search

Actions for selected content:

Save Search

2326 results in Statistical theory and methods

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary