2 results
10 - Annealed On-line Learning in Multilayer Neural Networks
-
- By Siegfried Bös, Brain Science Institute, RIKEN Wako–shi, Saitama 351–0198, Japan, Shun-Ichi Amari, Brain Science Institute, RIKEN Wako–shi, Saitama 351–0198, Japan
- Edited by David Saad, Aston University
-
- Book:
- On-Line Learning in Neural Networks
- Published online:
- 28 January 2010
- Print publication:
- 28 January 1999, pp 209-230
-
- Chapter
- Export citation
-
Summary
Abstract
In this article we will examine online learning with an annealed learning rate. Annealing the learning rate is necessary if online learning is to reach its optimal solution. With a fixed learning rate, the system will approximate the best solution only up to some fluctuations. These fluctuations are proportional to the size of the fixed learning rate. It has been shown that an optimal annealing can make online learning asymptotically efficient meaning that asymptotically it learns as fast as possible. These results are until now only realized in very simple networks, like single–layer perceptrons (section 3). Even the simplest multilayer network, the soft committee machine, shows an additional symptom, which makes straightforward annealing uneffective. This is because, at the beginning of learning the committee machine is attracted by a metastable, suboptimal solution (section 4). The system stays in this metastable solution for a long time and can only leave it, if the learning rate is not too small. This delays the start of annealing considerably. Here we will show that a non–local or matrix update can prevent the system from becoming trapped in the metastable phase, allowing for annealing to start much earlier (section 5). Some remarks on the influence of the initial conditions and a possible candidate for a theoretical support are discussed in section 6. The paper ends with a summary of future tasks and a conclusion.
Introduction
One of the most attractive properties of artificial neural networks is their ability to learn from examples and to generalize the acquired knowledge to unknown data.
5 - On-line Learning in Switching and Drifting Environments with Application to Blind Source Separation
-
- By Klaus-Robert Müller, GMD-First, Rudower Chaussee 5, D-12489 Berlin, Germany, Andreas Ziehe, GMD-First, Rudower Chaussee 5, D-12489 Berlin, Germany, Noboru Murata, Brain Science Institute, RIKEN, Wako-shi, Saitama 351–0198, Japan, Shun-ichi Amari, Brain Science Institute, RIKEN, Wako-shi, Saitama 351–0198, Japan
- Edited by David Saad, Aston University
-
- Book:
- On-Line Learning in Neural Networks
- Published online:
- 28 January 2010
- Print publication:
- 28 January 1999, pp 93-110
-
- Chapter
- Export citation
-
Summary
Abstract
An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for drifting and switching non-stationary blind separation tasks of accoustic signals.
Introduction
Neural networks are powerful tools that can capture the structure in data by learning. Often the batch learning paradigm is assumed, where the learner is given all training examples simultaneously and allowed to use them as often as desired. In large practical applications batch learning is experienced to be rather infeasible and instead on-line learning is employed.
In the on-line learning scenario only one example is given at a time and then discarded after learning. So it is less memory consuming and at the same time it fits well into more natural learning, where the learner receives new information at every moment and should adapt to it, without having a large memory for storing old data. Appart from easier feasibility and data handling the most important advantage of on-line learning is its ability to adapt to changing environments, a quite common scenario in industrial applications where the data distribution changes gradually over time (e.g. due to wear and tear of the machines). If the learning machine does not detect and follow the change it is impossible to learn the data properly and large generalization errors will result.
![](/core/cambridge-core/public/images/lazy-loader.gif)