Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-09T17:00:58.628Z Has data issue: false hasContentIssue false

Deep neural networks – a developmental perspective

Published online by Cambridge University Press:  01 April 2016

Biing Hwang Juang*
Affiliation:
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA. Phone:+1 404 894 6618
*
Corresponding author: B.H. Juang juang@gatech.edu

Abstract

There is a recent surge in research activities around “deep neural networks” (DNN). While the notion of neural networks have enjoyed cycles of enthusiasm, which may continue its ebb and flow, concrete advances now abound. Significant performance improvements have been shown in a number of pattern recognition tasks. As a technical topic, DNN is important in classes and tutorial articles and related learning resources are available. Streams of questions, nonetheless, never subside from students or researchers and there appears to be a frustrating tendency among the learners to treat DNN simply as a black box. This is an awkward and alarming situation in education. This paper thus has the intent to help the reader to properly understand DNN, not just its mechanism (what and how) but its motivation and justification (why). It is written from a developmental perspective with a comprehensive view, from the very basic but oft-forgotten principle of statistical pattern recognition and decision theory, through the problem stages that may be encountered during system design, to key ideas that led to the new advance. This paper can serve as a learning guide with historical reviews and important references, helpful in reaching an insightful understanding of the subject.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. A two-class toy problem of pattern recognition in a 2D. (a) Contour of pdf of the two classes of data. (b) A scatter plot of the data, overlaid by support vectors and the class boundary selected by a linear SVM. (c) A scatter plot of the data, overlaid by support vectors and the class boundary selected by an RBF SVM.

Figure 1

Fig. 2. A conventional pattern recognition system versus an alternative. (a) A conventional pattern recognition system. (b) An alternative pattern recognition system; boundaries between blocks may not be clearly defined.

Figure 2

Table 1. Comparison of performance for the toy two-class problem.

Figure 3

Fig. 3. An illustration of basic neural networks consisting of interconnected neurons (a) Neural networks. (b) Computational model of a McCulloch–Pitts neuron.

Figure 4

Fig. 4. Two equivalent depictions of a three-node Hopfield network. (a) A three-node Hopfield net. (b) A three-node Hopfield net with explicit synchronous delay and recurrence.

Figure 5

Fig. 5. BMs. (a) A BM. (b) A RBM.

Figure 6

Fig. 6. Multi-layer FNNs (a) A single-layer multi-class (M=3) Perceptron (b) Multi-layer FNN.

Figure 7

Fig. 7. Convergence of a (2×8) RBM trained to memorize a single point. (a) uniformly distributed random points used to test the RBM; (b) enlarged (c) to show the output of the RBM after epoch 1 in response to (1) 0-mean 1-var noise (purple color) and (2) uniform-distributed random point set of (a); (c)–(e) RBM output at epoch 1, 3, and 5.

Figure 8

Fig. 8. Convergence of a (2×8) RBM trained to memorize two points (2, 2) and (−2,−2). (a) random points used to train the RBM; (b)–(e) RBM output at epoch 1, 3, 5, and 7 showing convergence in response to two sets of input; (f) trajectories of convergence for uniformly distributed points as input.

Figure 9

Fig. 9. Convergence of a (2×8) RBM trained to memorize four points. (a) random points from four bivariate Gaussian distributions used to train the RBM; (b)–(d) RBM output at epoch 1, 3, and 5, showing convergence in response to two sets of input (uniform and Gaussian mixtures).

Figure 10

Fig. 10. Examples of MNIST dataset, digit 2.

Figure 11

Fig. 11. (a) Output of an RBM trained on digit 2, in response to all digits (0–9) as input; progressive epochs toward convergence are shown from left to right; (b) converged output of the same RBM in response to random patterns as input; (c) a random pattern as input and outputs at six successive epochs.

Figure 12

Fig. 12. RBM trained on digit 2 responds to noisy input from all digits: (a) low noise (high signal-to-noise ratio (SNR)); (b) high noise (low SNR).

Figure 13

Fig. 13. Fitting mixture densities to data distribution.

Figure 14

Fig. 14. Unrolling of an RBM.

Figure 15

Fig. 15. Illustration of data in a manifold. (a) Two classes of data fitted by two Gaussian pdfs. (b) Scatter plot of the transformed representations.

Figure 16

Table 2. Performance evaluation for various modeling techniques and systems [2].

Figure 17

Fig. 16. An illustration of simultaneous processing of K frames of speech.

Figure 18

Table 3. (a) WER for three systems with single frame modeling; (b) WER of DNN-6 as a function of the size (K) of the processing block in no. of frames [55].