Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-28T15:24:01.304Z Has data issue: false hasContentIssue false

A tutorial survey of architectures, algorithms, and applications for deep learning

Published online by Cambridge University Press:  22 January 2014

Li Deng*
Affiliation:
Microsoft Research, Redmond, WA 98052, USA. Phone: 425-706-2719
*
Corresponding author: L. Deng Email: deng@microsoft.com

Abstract

In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures – deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks) – one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution licence http://creativecommons.org/licenses/by/3.0/
Copyright
Copyright © The Authors, 2014
Figure 0

Table 1. Some basic deep learning terminologies.

Figure 1

Fig. 1. The architecture of the deep autoencoder used in [30] for extracting “bottle-neck” speech features from high-resolution spectrograms.

Figure 2

Fig. 2. Top to Bottom: Original spectrogram; reconstructions using input window sizes of N = 1, 3, 9, and 13 while forcing the coding units to be zero or one (i.e., a binary code). The y-axis values indicate FFT bin numbers (i.e., 256-point FFT is used for constructing all spectrograms).

Figure 3

Fig. 3. Top to bottom: Original spectrogram from the test set; reconstruction from the 312-bit VQ coder; reconstruction from the 312-bit autoencoder; coding errors as a function of time for the VQ coder (blue) and autoencoder (red); spectrogram of the VQ coder residual; spectrogram of the deep autoencoder's residual.

Figure 4

Fig. 4. A pictorial view of sampling from a RBM during the “negative” learning phase of the RBM (courtesy of G. Hinton).

Figure 5

Fig. 5. Illustration of a DBN/DNN architecture.

Figure 6

Fig. 6. Interface between DBN–DNN and HMM to form a DNN–HMM. This architecture has been successfully used in speech recognition experiments reported in [25].

Figure 7

Fig. 7. A DSN architecture with input–output stacking. Only four modules are illustrated, each with a distinct color. Dashed lines denote copying layers.

Figure 8

Fig. 8. Comparisons of one single module of a DSN (left) and that of a tensorized-DSN (TDSN). Two equivalent forms of a TDSN module are shown to the right.

Figure 9

Fig. 9. Stacking of TDSN modules by concatenating prediction vector with input vector.

Figure 10

Fig. 10. Stacking of TDSN modules by concatenating two hidden-layers' vectors with the input vector.