Search results for Pattern Recognition and Machine Learning

6 - Parameter Adaptation in Stochastic Optimization
- By Luís B. Almeida, INESC R. Alves Redol, 9 1000 Lisboa, Portugal, Thibault Langlois, Jose D. Amaral, Alexander Plakhov, INESC R. Alves Redol, 9 1000 Lisboa, Portugal
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 111-134
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
Optimization is an important operation in many domains of science and technology. Local optimization techniques typically employ some form of iterative procedure, based on derivatives of the function to be optimized (objective function). These techniques normally involve parameters that must be set by the user, often by trial and error. Those parameters can have a strong influence on the convergence speed of the optimization. In several cases, a significant speed advantage could be gained if one could vary these parameters during the optimization, to reflect the local characteristics of the function being optimized. Some parameter adaptation methods have been proposed for this purpose, for deterministic optimization situations. For stochastic (also called on-line) optimization situations, there appears to be no simple and effective parameter adaptation method.
This paper proposes a new method for parameter adaptation in stochastic optimization. The method is applicable to a wide range of objective functions, as well as to a large set of local optimization techniques. We present the derivation of the method, details of its application to gradient descent and to some of its variants, and examples of its use in the gradient optimization of several functions, as well as in the training of a multilayer perceptron by on-line backpropagation.
Introduction
Optimization is an operation that is often used in several different domains of science and technology. It normally consists of maximizing or minimizing a given function (called objective function), that is chosen to represent the quality of a given system. The system may be physical, (mechanical, chemical, etc.), a mathematical model, a computer program, etc., or even a mixture of several of these.

Frontmatter
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

17 - Optimal Perceptron Learning: an On-line Bayesian Approach
- By Sara Solla, Physics and Astronomy, Northwestern University, Evanston, IL 60208,; Physiology, Northwestern University Medical School, Chicago, IL 60611, USA, Ole Winther, CONNECT, The Niels Bohr Institute, 2100 Copenhagen Ø, Denmark; Theoretical Physics II, Lund University, S-223 62 Lund, Sweden
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 379-398
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
The recently proposed Bayesian approach to online learning is applied to learning a rule defined as a noisy single layer perceptron with either continuous or binary weights. In the Bayesian online approach the exact posterior distribution is approximated by a simpler parametric posterior that is updated online as new examples are incorporated to the dataset. In the case of continuous weights, the approximate posterior is chosen to be Gaussian. The computational complexity of the resulting online algorithm is found to be at least as high as that of the Bayesian offline approach, making the online approach less attractive. A Hebbian approximation based on casting the full covariance matrix into an isotropic diagonal form significantly reduces the computational complexity and yields a previously identified optimal Hebbian algorithm. In the case of binary weights, the approximate posterior is chosen to be a biased binary distribution. The resulting online algorithm is derived and shown to outperform several other online approaches to this problem.
Introduction
Neural networks are adaptive systems characterized by a set of parameters w, the weights and biases that specify the connectivity among the neuronal computational elements. Of particular interest is the ability of these systems to learn from examples. Traditional formulations of the learning problem are based on a dynamical prescription for the adaptation of the parameters w. The learning process thus generates a trajectory in w space that starts from a random initial assignment w0 and leads to a specific w* that is in some sense optimal.

15 - On-line Learning of a Decision Boundary with and without Queries
- By Yoshiyuki Kabashima, Dept. of Comp. Intelligence and Systems Science Graduate School of Science and Engineering Tokyo Institute of Technology Yokohama 226, Japan, Shigeru Shinomoto, Dept. of Physics, Kyoto University Sakyo-ku, Kyoto, 606-8502, Japan
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 345-362
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

14 - Dynamics of Supervised Learning with Restricted Training Sets
- By A. C. C. Coolen, Department of Mathematics, King's College, University of London, Strand, London WC2R 2LS, U.K., D. Saad, Department of Computer Science and Applied Mathematics, Aston University, Aston Triangle, Birmingham B4 7ET, U.K.
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 303-344
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions. We show how dynamical replica theory can be used to predict the evolution of macroscopic observables, including the relevant performance measures, incorporating the theory of complete training sets in the limit p/N → ∞ as a special case. For simplicity we restrict ourselves here to single-layer networks and realizable tasks.
Introduction
In the last few years much progress has been made in the analysis of the dynamics of supervised learning in layered neural networks, using the strategy of statistical mechanics: by deriving from the microscopic dynamical equations a set of closed laws describing the evolution of suitably chosen macroscopic observables (dynamic order parameters) in the limit of an infinite system size [e.g. Kinzel & Rujan (1990), Kinouchi & Caticha (1992), Biehl & Schwarze (1992, 1995), Saad & Solla (1995)]. A recent review and more extensive guide to the relevant references can be found in Mace & Coolen (1998a).

9 - Incorporating Curvature Information into On-line Learning
- By Magnus Rattray, Neural Computing Research Group, Aston University Birmingham B4 7ET, UK, David Saad, Neural Computing Research Group, Aston University Birmingham B4 7ET, UK
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 183-208
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.
Introduction
On-line learning is a popular method for training multi-layer feed-forward neural networks, especially for large systems and for problems requiring rapid and adaptive data processing. Under the on-line learning framework, network parameters are updated according to only the latest in a sequence of training examples.

16 - A Bayesian Approach to On-line Learning
- By Manfred Opper, Neural Computing Research Group, Aston University Birmingham B4 7ET, UK.
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 363-378
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
Online learning is discussed from the viewpoint of Bayesian statistical inference. By replacing the true posterior distribution with a simpler parametric distribution, one can define an online algorithm by a repetition of two steps: An update of the approximate posterior, when a new example arrives, and an optimal projection into the parametric family. Choosing this family to be Gaussian, we show that the algorithm achieves asymptotic efficiency. An application to learning in single layer neural networks is given.
Introduction
Neural networks have the ability to learn from examples. For batch learning, a set of training examples is collected and subsequently an algorithm is run on the entire training set to adjust the parameters of the network. On the other hand, for many practical problems, examples arrive sequentially and an instantaneous action is required at each time. In order to save memory and time this action should not depend on the entire set of data which have arrived sofar. This principle is realized in online algorithms, where usually only the last example is used for an update of the network's parameters. Obviously, some amount of information about the past examples is discarded in this approach. Surprisingly, recent studies showed that online algorithms can achieve a similar performance as batch algorithms, when the number of data grows large (Biehl and Riegler 1994; Barkai et al 1995; Kim and Sompolinsky 1996).
In order to understand the abilities and limitations of online algorithms, the question of optimal online learning has been raised.

13 - On-line Learning from Finite Training Sets
- By David Barber, Department of Medical Biophysics, University of Nijmegen, 6525 EZ Nijmegen, The Netherlands, Peter Sollich, Department of Physics, University of Edinburgh, Edinburgh EH9 3JZ, U.K.
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 279-302
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
We analyse online gradient descent learning from finite training sets at non-infinitesimal learning rates η for both linear and non-linear networks. In the linear case, exact results are obtained for the time-dependent generalization error of networks with a large number of weights N, trained on p = αN examples. This allows us to study in detail the effects of finite training set size α on, for example, the optimal choice of learning rate η. We also compare online and offline learning, for respective optimal settings of η at given final learning time. Online learning turns out to be much more robust to input bias and actually outperforms offline learning when such bias is present; for unbiased inputs, online and offline learning perform almost equally well. Our analysis of online learning for non-linear networks (namely, soft-committee machines), advances the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.
Introduction
The analysis of online (gradient descent) learning, which is one of the most common approaches to supervised learning found in the neural networks community, has recently been the focus of much attention. The characteristic feature of online learning is that the weights of a network (‘student’) are updated each time a new training example is presented, such that the error on this example is reduced.

8 - Universal Asymptotics in Committee Machines with Tree Architecture
- By Mauro Copelli, Limburgs Universitair Centrum B-3590 Diepenbeek, Belgium, Nestor Caticha, Instituto de Física, Universidade de São Paulo Caixa Postal 66318, 05389–970 São Paulo, SP, Brazil
Edited by David Saad, Aston University
Book:

On-Line Learning in Neural Networks

Published online:

28 January 2010

Print publication:

28 January 1999, pp 165-182
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Abstract
On-line supervised learning in the general K Tree Committee Machine (TCM) is studied for a uniform distribution of inputs. Examples are corrupted by multiplicative noise in the teacher output. From the differential equations which describe the learning dynamics, the modulation function which optimizes the generalization ability is exactly obtained for any finite K. The asymptotical behavior of the generalization error is shown to be independent of K. Robustness with respect to a misestimation of the noise level is also shown to be independent of K.
Introduction
When looking into the properties of different neural network architectures by studying their performance in different model situations, the main objective, rather than delving into the many differences, is to search for similarities. It is from these similarities that intrinsic properties of learning, that go beyond the particular characteristics of the simple models, may be identified.
In order to develop a program of this nature several studies within the community of Statistical Mechanics of Neural Networks (Watkin, Rau and Biehl, 1993) have been pursued. Among the most important contributions that this approach brings to the study of machine learning is the possibility of dealing with networks of a very large size, that is in the thermodynamic limit (TL) and of introducing efficient techniques to average over the randomness associated to the data. The model scenarios that have been analized arise from combinations of the different learning conditioning factors. These include, among others, unsupervised versus supervised learning, realizable rules or not, learning in the presence of noise or in the more idealized noiseless case, learning in a time dependent or constant environment.

Glossary
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 347-354
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Feed-forward Neural Networks
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 143-180
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A great deal of hyperbole has been devoted to neural networks, both in their first wave around 1960 (Widrow & Hoff, 1960; Rosenblatt, 1962) and in their renaissance from about 1985 (chiefly inspired by Rumelhart & McClelland, 1986), but the ideas of biological relevance seem to us to have detracted from the essence of what is being discussed, and are certainly not relevant to practical applications in pattern recognition. Because ‘neural networks’ has become a popular subject, it has collected many techniques which are only loosely related and were not originally biologically motivated. In this chapter we will discuss the core area of feed-forward or ‘back-propagation’ neural networks, which can be seen as extensions of the ideas of the perceptron (Section 3.6). From this connection, these networks are also known as multi-layer perceptrons.
A formal definition of a feed-forward network is given in the glossary. Informally, they have units which have one-way connections to other units, and the units can be labelled from inputs (low numbers) to outputs (high numbers) so that each unit is only connected to units with higher numbers. The units can always be arranged in layers so that connections go from one layer to a later layer. This is best seen graphically; see Figure 5.1.

Preface
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp ix-xi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Pattern recognition has a long and respectable history within engineering, especially for military applications, but the cost of the hardware both to acquire the data (signals and images) and to compute the answers made it for many years a rather specialist subject. Hardware advances have made the concerns of pattern recognition of much wider applicability. In essence it covers the following problem:
‘Given some examples of complex signals and the correct decisions for them, make decisions automatically for a stream of future examples.’
There are many examples from everyday life:
Name the species of a flowering plant.
Grade bacon rashers from a visual image.
Classify an X-ray image of a tumour as cancerous or benign.
Decide to buy or sell a stock option.
Give or refuse credit to a shopper.
Many of these are currently performed by human experts, but it is increasingly becoming feasible to design automated systems to replace the expert and either perform better (as in credit scoring) or ‘clone’ the expert (as in aids to medical diagnosis).
Neural networks have arisen from analogies with models of the way that humans might approach pattern recognition tasks, although they have developed a long way from the biological roots. Great claims have been made for these procedures, and although few of these claims have withstood careful scrutiny, neural network methods have had great impact on pattern recognition practice.

Frontmatter
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A Statistical Sidelines
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 333-346
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Flexible Discriminants
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 121-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 355-390
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Tree-structured Classifiers
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 213-242
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The use of tree-based methods for classification is relatively unfamiliar in both statistics and pattern recognition, yet they are widely used in some applications such as botany (Figure 7.1) and medical diagnosis as being extremely easy to comprehend (and hence have confidence in).
The automatic construction of decision trees dates from work in the social sciences by Morgan & Sonquist (1963) and Morgan & Messenger (1973). (Later work such as Doyle, 1973, and Doyle & Fenwick, 1975, commented on the pitfalls of such automated procedures.) In statistics Breiman et al. (1984) had a seminal influence both in bringing the work to the attention of statisticians and in proposing new algorithms for constructing trees. At around the same time decision tree induction was beginning to be used in the field of machine learning, which we review in Section 7.4, and in engineering (for example, Sethi & Sarvarayudu, 1982).
The terminology of trees is graphic, although conventionally trees such as Figure 7.2 are shown growing down the page. The root is the top node, and examples are passed down the tree, with decisions being made at each node until a terminal node or leaf is reached. Each non-terminal node contains a question on which a split is based. Each leaf contains the label of a classification. A subtree of T is a tree with root a node of T; it is a rooted subtree if its root is the root of T.

3 - Linear Discriminant Analysis
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 91-120
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Author Index
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 391-398
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Belief Networks
Brian D. Ripley, University of Oxford
Book:

Pattern Recognition and Neural Networks

Published online:

05 August 2014

Print publication:

18 January 1996, pp 243-286
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The supervised methods considered so far have learnt both the structure of the probability distributions and the numerical values from the training set, or in the case of parametric methods, imposed a conventional structure for convenience. Other methods incorporate non-numerical ‘real-world’ knowledge about the subject domain into the structure of the probability distributions. Such knowledge is often about causal relationships, or perhaps the lack of causality as expressed by conditional independence.
These ideas have been most explored within the field of expert systems. This is a loosely defined area, and definitions vary:
‘The label “expert system” is, broadly speaking, a program intended to make reasoned judgements or to give assistance in a complex area in which human skills are fallible or scarce. …’
(Lauritzen & Spiegelhalter, 1988, p. 157)
‘A program designed to solve problems at a level comparable to that of a human expert in a given domain.’ (Cooper, 1989)
‘An expert system has two parts. The first one is the knowledge base. It usually makes up most of the system. In its simplest form it is a list of IF … THEN rules: each specifies what to do, or what conclusions to draw, under a set of well-defined circumstances’.
The second part of the expert system often goes under the name of “shell”. As the name implies, it acts as a receptacle for the knowledge base and contains instruments for making efficient use of it.

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2275 results in Pattern Recognition and Machine Learning

6 - Parameter Adaptation in Stochastic Optimization

Summary

Frontmatter

17 - Optimal Perceptron Learning: an On-line Bayesian Approach

Summary

15 - On-line Learning of a Decision Boundary with and without Queries

14 - Dynamics of Supervised Learning with Restricted Training Sets

Summary

9 - Incorporating Curvature Information into On-line Learning

Summary

16 - A Bayesian Approach to On-line Learning

Summary

13 - On-line Learning from Finite Training Sets

Summary

8 - Universal Asymptotics in Committee Machines with Tree Architecture

Summary

Glossary

5 - Feed-forward Neural Networks

Summary

Preface

Summary

Frontmatter

A Statistical Sidelines

4 - Flexible Discriminants

References

7 - Tree-structured Classifiers

Summary

3 - Linear Discriminant Analysis

Author Index

8 - Belief Networks

Summary

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2275 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary