1 results
12 - On-line Learning with Time-Correlated Examples
-
- By Tom Heskes, RWCP Theoretical Foundation, SNN, Department of Medical Physics and Biophysics, University of Nijmegen, Geert Grooteplein 21, 6525 EZ Nijmegen, The Netherlands, Wim Wiegerinck, RWCP Theoretical Foundation, SNN, Department of Medical Physics and Biophysics, University of Nijmegen, Geert Grooteplein 21, 6525 EZ Nijmegen, The Netherlands.
- Edited by David Saad, Aston University
-
- Book:
- On-Line Learning in Neural Networks
- Published online:
- 28 January 2010
- Print publication:
- 28 January 1999, pp 251-278
-
- Chapter
- Export citation
-
Summary
Abstract
We study the dynamics of on-line learning with time-correlated patterns. In this, we make a distinction between “small” networks and “large” networks. “Small” networks have a finite number of input units and are usually studied using tools from stochastic approximation theory in the limit of small learning parameters. “Large” networks have an extensive number of input units. A description in terms of individual weights is no longer useful and tools from statistical mechanics can be applied to compute the evolution of macroscopic order parameters. We give general derivations for both cases, but in the end focus on the effect of correlations on plateaus. Plateaus are long time spans in which the performance of the networks hardly changes. Learning in both “small” and “large” multi-layered perceptrons is often hampered by the presence of plateaus. The effect of correlations, however, appears to be quite different: they can have a huge beneficial effect in small networks, but seem to have only marginal effects in large networks.
Introduction
On-line learning with correlations
The ability to learn from examples is an essential feature in many neural network applications (Hertz et al., 1991; Haykin, 1994). Learning from examples enables the network to adapt its parameters or weights to its environment without the need for explicit knowledge of that environment. In on-line learning examples from the environment are continually presented to the network at distinct time steps. At each time step a small adjustment of the network's weights is made on the basis of the currently presented pattern. This procedure is iterated as long as the network learns.