Skip to main content
×
×
Home

A tutorial survey of architectures, algorithms, and applications for deep learning

  • Li Deng (a1)
Abstract

In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures – deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks) – one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A tutorial survey of architectures, algorithms, and applications for deep learning
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A tutorial survey of architectures, algorithms, and applications for deep learning
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A tutorial survey of architectures, algorithms, and applications for deep learning
      Available formats
      ×
Copyright
The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution licence http://creativecommons.org/licenses/by/3.0/
Corresponding author
Corresponding author: L. Deng Email: deng@microsoft.com
References
Hide All
[1] Deng, L.: An overview of deep-structured learning for information processing, in Proc. Asian-Pacific Signal & Information Processing Annu. Summit and Conf. (APSIPA-ASC), October 2011.
[2] Deng, L.: Expanding the scope of signal processing. IEEE Signal Process. Mag., 25 (3) (2008), 24.
[3] Hinton, G.; Osindero, S.; Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput., 18 (2006), 15271554.
[4] Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn., 2 (1) (2009), 1127.
[5] Bengio, Y.; Courville, A.; Vincent, P.: Representation learning: a review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013), 17981828.
[6] Hinton, G. et al. : Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag., 29 (6) (2012), 8297.
[7] Yu, D.; Deng, L.: Deep learning and its applications to signal and information processing. IEEE Signal Process. Mag., 28 (2011), 145154.
[8] Arel, I.; Rose, C.; Karnowski, T.: Deep machine learning – a new frontier in artificial intelligence, in IEEE Computational Intelligence Mag., 5 (2010), 1318.
[9] Markoff, J.: Scientists See Promise in Deep-Learning Programs. New York Times, November 24, 2012.
[10] Cho, Y.; Saul, L.: Kernel methods for deep learning. NIPS, 2009, 342350.
[11] Deng, L.; Tur, G., He, X.; Hakkani-Tur, D.: Use of kernel deep convex networks and end-to-end learning for spoken language understanding, in Proc. IEEE Workshop on Spoken Language Technologies, December 2012.
[12] Vinyals, O.; Jia, Y.; Deng, L.; Darrell, T.: Learning with recursive perceptual representations, in Proc. NIPS, 2012.
[13] Baker, J. et al. : Research developments and directions in speech recognition and understanding. IEEE Signal Process. Mag., 26 (3) (2009), 7580.
[14] Baker, J. et al. : Updated MINS report on speech recognition and understanding. IEEE Signal. Process. Mag., 26 (4) (2009), 7885.
[15] Deng, L.: Computational models for speech production, in Computational Models of Speech Pattern Processing, 199213, Springer-Verlag, 1999, Berlin, Heidelberg.
[16] Deng, L.: Switching dynamic system models for speech articulation and acoustics, in Mathematical Foundations of Speech and Language Processing, 115134, Springer, New York, 2003.
[17] George, D.: How the Brain Might Work: A Hierarchical and Temporal Model for Learning and Recognition. Ph.D. thesis, Stanford University, 2008.
[18] Bouvrie, J.: Hierarchical Learning: Theory with Applications in Speech and Vision. Ph.D. thesis, MIT, 2009.
[19] Poggio, T.: How the brain might work: the role of information and learning in understanding and replicating intelligence, in Information: Science and Technology for the New Century Jacovitt, G., Pettorossi, A., Consolo, R., Senni, V., eds), 4561, Lateran University Press, 2007, Amsterdam, Netherlands.
[20] Glorot, X.; Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks, in Proc. AISTAT, 2010.
[21] Hinton, G.; Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science, 313 (5786) (2006), 504507.
[22] Dahl, G.; Yu, D.; Deng, L.; Acero, A.: Context-dependent DBN-HMMs in large vocabulary continuous speech recognition, in Proc. ICASSP, 2011.
[23] Mohamed, A.; Yu, D.; Deng, L.: Investigation of full-sequence training of deep belief networks for speech recognition, in Proc. Interspeech, September 2010.
[24] Mohamed, A.; Dahl, G.; Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20 (1) (2012), 1422.
[25] Dahl, G.; Yu, D.; Deng, L.; Acero, A.: Context-dependent DBN-HMMs in large vocabulary continuous speech recognition. IEEE Trans. Audio Speech, Lang. Process., 20 (1) (2012), 3042.
[26] Mohamed, A.; Hinton, G.; Penn, G.: Understanding how deep belief networks perform acoustic modelling, in Proc. ICASSP, 2012.
[27] Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.: Stacked denoising autoencoders: leaning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11 (2010), 33713408.
[28] Rifai, S.; Vincent, P.; Muller, X.; Glorot, X.; Bengio, Y.: Contractive autoencoders: explicit invariance during feature extraction, in Proc. ICML, 2011, 833840.
[29] Ranzato, M.; Boureau, Y.; LeCun, Y.: Sparse feature learning for deep belief networks, in Proc. NIPS, 2007.
[30] Deng, L.; Seltzer, M.; Yu, D.; Acero, A.; Mohamed, A.; Hinton, G.: Binary coding of speech spectrograms using a deep auto-encoder, in Proc. Interspeech, 2010.
[31] Bengio, Y.; De Mori, R.; Flammia, G.; Kompe, F.: Global optimization of a neural network – Hidden Markov model hybrid, in Proc. Proc. Eurospeech, 1991.
[32] Bourlard, H.; Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach, Kluwer, Norwell, MA, 1993.
[33] Morgan, N.: Deep and wide: multiple layers in automatic speech recognition. IEEE Trans. Audio Speech, Lang. Process., 20 (1) (2012), 713.
[34] Deng, L.; Li, X.: Machine learning paradigms in speech recognition: an overview. IEEE Trans. Audio Speech, Lang., 21 (2013), 10601089.
[35] LeCun, Y.; Chopra, S.; Ranzato, M.; Huang, F.: Energy-based models in document recognition and computer vision, in Proc. Int. Conf. Document Analysis and Recognition, (ICDAR), 2007.
[36] Ranzato, M.; Poultney, C.; Chopra, S.; LeCun, Y.: Efficient learning of sparse representations with an energy-based model, in Proc. NIPS, 2006.
[37] Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A.: Multimodal deep learning, in Proc. ICML, 2011.
[38] Ngiam, J.; Chen, Z.; Koh, P.; Ng, A.: Learning deep energy models, in Proc. ICML, 2011.
[39] Hinton, G.; Krizhevsky, A.; Wang, S.: Transforming auto-encoders, Proc. Int. Conf. Artificial Neural Networks, 2011.
[40] Salakhutdinov, R.; Hinton, G.: Deep Boltzmann machines, in Proc. AISTATS, 2009.
[41] Salakhutdinov, R.; Hinton, G.: A better way to pretrain deep Boltzmann machines, in Proc. NIPS, 2012.
[42] Srivastava, N.; Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines, in Proc. NIPS, 2012.
[43] Dahl, G.; Ranzato, M.; Mohamed, A.; Hinton, G.: Phone recognition with the mean-covariance restricted Boltzmann machine. Proc. NIPS, 23 (2010), 469477.
[44] Poon, H.; Domingos, P.: Sum-product networks: a new deep architecture, in Proc. Twenty-Seventh Conf. Uncertainty in Artificial Intelligence, Barcelona, Spain, 2011.
[45] Gens, R.; Domingo, P.: Discriminative learning of sum-product networks. Proc. NIPS, 2012.
[46] Sutskever, I.; Martens, J.; Hinton, G.: Generating text with recurrent neural networks, in Proc. ICML, 2011.
[47] Martens, J.: Deep learning with Hessian-free optimization, in Proc. ICML, 2010.
[48] Martens, J.; Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization, in Proc. ICML, 2011.
[49] Bengio, Y.; Boulanger, N.; Pascanu, R.: Advances in optimizing recurrent networks, in Proc. ICASSP, 2013.
[50] Sutskever, I.: Training Recurrent Neural Networks. Ph.D. thesis, University of Toronto, 2013.
[51] Mikolov, T.; Karafiat, M.; Burget, L.; Cernocky, J.; Khudanpur, S.: Recurrent neural network based language model, in Proc. ICASSP, 2010, 10451048.
[52] Mesnil, G.; He, X.; Deng, L.; Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding, in Proc. Interspeech, 2013.
[53] Deng, L.: DYNAMIC SPEECH MODELS – Theory, Algorithm, and Application, Morgan & Claypool, December 2006.
[54] Deng, L.: A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process., 27 (1) (1992), 6578.
[55] Deng, L.: A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans. Speech Audio Process., 1 (4) (1993), 471475.
[56] Deng, L.; Aksmanovic, M.; Sun, D.; Wu, J.: Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans. Speech Audio Process., 2 (4) (1994), 507520.
[57] Ostendorf, M.; Digalakis, V.; Kimball, O.: From HMM's to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Speech Audio Process., 4 (5) (1996), 360378.
[58] Deng, L.; Sameti, H.: Transitional speech units and their representation by regressive Markov states: applications to speech recognition. IEEE Trans. Speech Audio Process., 4 (4) (1996), 301306.
[59] Deng, L.; Aksmanovic, M.: Speaker-independent phonetic classification using hidden Markov models with state-conditioned mixtures of trend functions. IEEE Trans. Speech Audio Process., 5 (1997), 319324.
[60] Yu, D.; Deng, L.: Solving nonlinear estimation problems using Splines. IEEE Signal Process. Mag., 26 (4) (2009), 8690.
[61] Yu, D., Deng, L.; Gong, Y.; Acero, A.: A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans. Audio Speech Lang. Process., 17 (7) (2009), 13481360.
[62] Zen, H.; Nankaku, Y.; Tokuda, K.: Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans. Audio Speech, Lang. Process., 19 (2) (2011), 417430.
[63] Zen, H.; Gales, M. J. F.; Nankaku, Y.; Tokuda, K.: Product of experts for statistical parametric speech synthesis. IEEE Trans. Audio Speech, Lang. Process., 20 (3) (2012), 794805.
[64] Ling, Z.; Richmond, K.; Yamagishi, J.: Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression. IEEE Trans. Audio Speech Lang. Process., 21 (2013), 207219.
[65] Ling, Z.; Deng, L.; Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis, in ICASSP, 2013, 78257829.
[66] Shannon, M.; Zen, H.; Byrne, W.: Autoregressive models for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process., 21 (3) (2013), 587597.
[67] Deng, L.; Ramsay, G.; Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun., 33 (2–3) (1997), 93111.
[68] Bridle, J. et al. : An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Language Engineering, CLSP, Johns Hopkins, 1998.
[69] Picone, P. et al. : Initial evaluation of hidden dynamic models on conversational speech, in Proc. ICASSP, 1999.
[70] Minami, Y.; McDermott, E.; Nakamura, A.; Katagiri, S.: A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series, in Proc. ICASSP, 2002, 957960.
[71] Deng, L.; Huang, X.D.: Challenges in adopting speech recognition. Commun. ACM, 47 (1) (2004), 1113.
[72] Ma, J.; Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Speech Audio Process., 11 (6) (2003), 590602.
[73] Ma, J.; Deng, L.: Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Speech Audio Process., 12 (1) (2004), 4758.
[74] Deng, L.; Yu, D.; Acero, A.: Structured speech modeling. IEEE Trans. Audio Speech Lang. Process., 14 (5) (2006), 14921504.
[75] Deng, L.; Yu, D.; Acero, A.: A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Audio Speech Process., 14 (1) (2006a), 256265.
[76] Deng, L.; Yu, D.: Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition, in Proc. ICASSP, April 2007.
[77] Bilmes, J.; Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag., 22 (2005), 89100.
[78] Bilmes, J.: Dynamic graphical models. IEEE Signal Process. Mag., 33 (2010), 2942.
[79] Rennie, S.; Hershey, H.; Olsen, P.: Single-channel multitalker speech recognition – graphical modeling approaches. IEEE Signal Process. Mag., 33 (2010), 6680.
[80] Wohlmayr, M.; Stark, M.; Pernkopf, F.: A probabilistic interaction model for multipitch tracking with factorial hidden Markov model. IEEE Trans. Audio Speech, Lang. Process., 19 (4) (2011).
[81] Stoyanov, V.; Ropson, A.; Eisner, J.: Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure, in Proc. AISTAT, 2011.
[82] Kurzweil, R.: How to Create a Mind. Viking Books, December, 2012.
[83] Fine, S.; Singer, Y.; Tishby, N.: The Hierarchical Hidden Markov Model: analysis and applications. Mach. Learn., 32 (1998), 4162.
[84] Oliver, N.; Garg, A.; Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Understand., 96 (2004), 163180.
[85] Taylor, G.; Hinton, G.E.; Roweis, S.: Modeling human motion using binary latent variables, in Proc. NIPS, 2007.
[86] Socher, R.; Lin, C.; Ng, A.; Manning, C.: Learning continuous phrase representations and syntactic parsing with recursive neural networks, in Proc. ICML, 2011.
[87] Juang, B.-H., Chou, W.; Lee, C.-H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process., 5 (1997), 257265.
[88] Chengalvarayan, R.; Deng, L.: Speech trajectory discrimination using the minimum classification error learning. IEEE Trans. Speech Audio Process., 6 (6) (1998), 505515.
[89] Povey, D.; Woodland, P.: Minimum phone error and i-smoothing for improved discriminative training, in Proc. ICASSP, 2002, 105108.
[90] He, X.; Deng, L.; Chou, W.: Discriminative learning in sequential pattern recognition – a unifying review for optimization-oriented speech recognition. IEEE Signal Process. Mag., 25 (2008), 1436.
[91] Jiang, H.; Li, X.: Parameter estimation of statistical models using convex optimization: an advanced method of discriminative training for speech and language processing. IEEE Signal Process. Mag., 27 (3) (2010), 115127.
[92] Yu, D.; Deng, L.; He, X.; Acero, X.: Large-margin minimum classification error training for large-scale speech recognition tasks, in Proc. ICASSP, 2007.
[93] Xiao, L.; Deng, L.: A geometric perspective of large-margin training of Gaussian models. IEEE Signal Process. Mag., 27 (6) (2010), 118123.
[94] Gibson, M.; Hain, T.: Error approximation and minimum phone error acoustic model estimation. IEEE Trans. Audio Speech, Lang. Process., 18 (6) (2010), 12691279.
[95] Yang, D.; Furui, S.: Combining a two-step CRF model and a joint source channel model for machine transliteration, in Proc. ACL, Uppsala, Sweden, 2010, 275280.
[96] Yu, D.; Wang, S.; Deng, L.: Sequential labeling using deep-structured conditional random fields. J. Sel. Top. Signal Process., 4 (2010), 965973.
[97] Hifny, Y.; Renals, S.: Speech recognition using augmented conditional random fields. IEEE Trans. Audio Speech Lang. Process., 17 (2) (2009), 354365.
[98] Heintz, I.; Fosler-Lussier, E.; Brew, C.: Discriminative input stream combination for conditional random field phone recognition. IEEE Trans. Audio Speech Lang. Process., 17 (8) (2009), 15331546.
[99] Zweig, G.; Nguyen, P.: A segmental CRF approach to large vocabulary continuous speech recognition, In Proc. ASRU, 2009.
[100] Peng, J.; Bo, L.; Xu, J.: Conditional neural fields, in Proc. NIPS, 2009.
[101] Heigold, G.; Ney, H.; Lehnen, P.; Gass, T.; Schluter, R.: Equivalence of generative and log-liner models. IEEE Trans. Audio Speech Lang. Process., 19 (5) (2011), 11381148.
[102] Yu, D.; Deng, L.: Deep-structured hidden conditional random fields for phonetic recognition, in Proc. Interspeech, September. 2010.
[103] Yu, D.; Wang, S.; Karam, Z.; Deng, L.: Language recognition using deep-structured conditional random fields, in Proc. ICASSP, 2010, 50305033.
[104] Pinto, J.; Garimella, S.; Magimai-Doss, M.; Hermansky, H.; Bourlard, H.: Analysis of MLP-based hierarchical phone posterior probability estimators. IEEE Trans. Audio Speech Lang. Process., 19 (2) (2011), 225241.
[105] Ketabdar, H.; Bourlard, H.: Enhanced phone posteriors for improving speech recognition systems. IEEE Trans. Audio Speech Lang. Process., 18 (6) (2010), 10941106.
[106] Morgan, N. et al. : Pushing the envelope – aside [speech recognition]. IEEE Signal Process. Mag., 22 (5) (2005), 8188.
[107] Deng, L.; Yu, D.: Deep Convex Network: a scalable architecture for speech pattern classification, in Proc. Interspeech, 2011.
[108] Deng, L.; Yu, D.; Platt, J.: Scalable stacking and learning for building deep architectures, in Proc. ICASSP, 2012.
[109] Tur, G.; Deng, L.; Hakkani-Tür, D.; He, X.: Towards deep understanding: deep convex networks for semantic utterance classification, in Proc. ICASSP, 2012.
[110] Lena, P.; Nagata, K.; Baldi, P.: Deep spatiotemporal architectures and learning for protein structure prediction, in Proc. NIPS, 2012.
[111] Hutchinson, B.; Deng, L.; Yu, D.: A deep architecture with bilinear modeling of hidden representations: applications to phonetic recognition, in Proc. ICASSP, 2012.
[112] Hutchinson, B.; Deng, L.; Yu, D.: Tensor deep stacking networks, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013), 19441957.
[113] Deng, L.; Hassanein, K.; Elmasry, M.: Analysis of correlation structure for a neural predictive model with application to speech recognition. Neural Netw., 7 (2) (1994a), 331339.
[114] Robinson, A.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw., 5 (1994), 298305.
[115] Graves, A.; Fernandez, S.; Gomez, F.; Schmidhuber, J.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks, in Proc. ICML, 2006.
[116] Graves, A.; Mahamed, A.; Hinton, G.: Speech recognition with deep recurrent neural networks, in Proc. ICASSP, 2013.
[117] Graves, A.: Sequence transduction with recurrent neural networks, in Representation Learning Worksop, ICML, 2012.
[118] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE, 86 (1998), 22782324.
[119] Ciresan, D.; Giusti, A.; Gambardella, L.; Schidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images, in Proc. NIPS, 2012.
[120] Dean, J. et al. : Large scale distributed deep networks, in Proc. NIPS, 2012.
[121] Krizhevsky, A.; Sutskever, I.; Hinton, G.: ImageNet classification with deep convolutional neural Networks, in Proc. NIPS, 2012.
[122] Abdel-Hamid, O.; Mohamed, A.; Jiang, H.; Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. in ICASSP, 2012.
[123] Abdel-Hamid, O.; Deng, L.; Yu, D.: Exploring convolutional neural network structures and optimization for speech recognition. in Proc. Interspeech, 2013.
[124] Abdel-Hamid, O.; Deng, L.; Yu, D.; Jiang, H.: Deep segmental neural networks for speech recognition, in Proc. Interspeech, 2013a.
[125] Sainath, T.; Mohamed, A.; Kingsbury, B.; Ramabhadran, B.: Convolutional neural networks for LVCSR, in Proc. ICASSP, 2013.
[126] Deng, L.; Abdel-Hamid, O.; Yu, D.: A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion, In Proc. ICASSP, 2013.
[127] Lang, K.; Waibel, A.; Hinton, G.: A time-delay neural network architecture for isolated word recognition. Neural Netw., 3 (1) (1990), 2343.
[128] Hawkins, J.; Blakeslee, S.: On Intelligence: How a New Understanding of the Brain will lead to the Creation of Truly Intelligent Machines, Times Books, New York, 2004.
[129] Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Trans. ASSP, 37 (3) (1989), 328339.
[130] Hawkins, G.; Ahmad, S.; Dubinsky, D.: Hierarchical Temporal Memory including HTM Cortical Learning Algorithms. Numenta Technical Report, December 10, 2010.
[131] Lee, C.-H.: From knowledge-ignorant to knowledge-rich modeling: a new speech research paradigm for next-generation automatic speech recognition, In Proc. ICSLP, 2004, 109111.
[132] Yu, D.; Siniscalchi, S.; Deng, L.; Lee, C.: Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition, in Proc. ICASSP, 2012.
[133] Siniscalchi, M.; Yu, D.; Deng, L.; Lee, C.-H.: Exploiting deep neural networks for detection-based speech recognition. Neurocomputing, 106 (2013), 148157.
[134] Siniscalchi, M.; Svendsen, T.; Lee, C.-H.: A bottom-up modular search approach to large vocabulary continuous speech recognition. IEEE Trans. Audio Speech, Lang. Process., 21 (2013), 786797.
[135] Yu, D.; Seide, F.; Li, G.; Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition, in Proc. ICASSP, 2012.
[136] Deng, L.; Sun, D.: A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features. J. Acoust. Soc. Am., 85 (5) (1994), 27022719.
[137] Sun, J.; Deng, L.: An overlapping-feature based phonological model incorporating linguistic constraints: applications to speech recognition. J. Acoust. Soc. Am., 111 (2) (2002), 10861101.
[138] Sainath, T.; Kingsbury, B.; Ramabhadran, B.: Improving training time of deep belief networks through hybrid pre-training and larger batch sizes, in Proc. NIPS Workshop on Log-linear Models, December 2012.
[139] Erhan, D.; Bengio, Y.; Courvelle, A.; Manzagol, P.; Vencent, P.; Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11, (2010), 625660.
[140] Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, in Proc. ICASSP, 2009.
[141] Kingsbury, B.; Sainath, T.; Soltau, H.: Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization, in Proc. Interspeech, 2012.
[142] Larochelle, H.; Bengio, Y.: Classification using discriminative restricted Boltzmann machines, in Proc. ICML, 2008.
[143] Lee, H.; Grosse, R.; Ranganath, R.; and Ng, A.: Unsupervised learning of hierarchical representations with convolutional deep belief networks, Communications of the ACM, Vol. 54, No. 10, October, 2011, pp. 95103.
[144] Lee, H.; Grosse, R.; Ranganath, R.; Ng, A.: Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Proc. ICML, 2009.
[145] Lee, H.; Largman, Y.; Pham, P.; Ng, A.: Unsupervised feature learning for audio classification using convolutional deep belief networks, Proc. NIPS, 2010.
[146] Ranzato, M.; Susskind, J.; Mnih, V.; Hinton, G.: On deep generative models with applications to recognition, in Proc. CVPR, 2011.
[147] Ney, H.: Speech translation: coupling of recognition and translation, in Proc. ICASSP, 1999.
[148] He, X.; Deng, L.: Speech recognition, machine translation, and speech translation – a unifying discriminative framework. IEEE Signal Process. Mag., 28 (2011), 126133.
[149] Yamin, S.; Deng, L.; Wang, Y.; Acero, A.: An integrative and discriminative technique for spoken utterance classification. IEEE Trans. Audio Speech Lang. Process., 16 (2008), 12071214.
[150] He, X.; Deng, L.: Optimization in speech-centric information processing: criteria and techniques, in Proc. ICASSP, 2012.
[151] He, X.; Deng, L.: Speech-centric information processing: an optimization-oriented approach, in Proc. IEEE, 2013.
[152] Deng, L.; He, X.; Gao, J.: Deep stacking networks for information retrieval, in Proc. ICASSP, 2013a.
[153] He, X.; Deng, L.; Tur, G.; Hakkani-Tur, D.: Multi-style adaptive training for robust cross-lingual spoken language understanding, in Proc. ICASSP, 2013.
[154] Le, Q.; Ranzato, M.; Monga, R.; Devin, M.; Corrado, G.; Chen, K.; Dean, J.; Ng, A: Building high-level features using large scale unsupervised learning, in Proc. ICML, 2012.
[155] Seide, F.; Li, G.; Yu, D.: Conversational speech transcription using context-dependent deep neural networks. Proc. Interspeech, (2011), 437440.
[156] Yan, Z.; Huo, Q.; Xu, J.: A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR, in Proc. Interspeech, 2013.
[157] Welling, M.; Rosen-Zvi, M.; Hinton, G.: Exponential family harmoniums with an application to information retrieval. Proc. NIPS, vol. 20 (2005).
[158] Hinton, G.: A practical guide to training restricted Boltzmann machines. UTML Technical Report 2010-003, University of Toronto, August 2010.
[159] Wolpert, D.: Stacked generalization. Neural Netw., 5 (2) (1992), 241259.
[160] Cohen, W.; de Carvalho, R.V.: Stacked sequential learning, in Proc. IJCAI, 2005, 671676.
[161] Jarrett, K.; Kavukcuoglu, K.; LeCun, Y.: What is the best multistage architecture for object recognition?, in Proc. Int. Conf. Computer Vision, 2009, 21462153.
[162] Pascanu, R.; Mikolov, T.; Bengio, Y.: On the difficulty of training recurrent neural networks, in Proc. ICML, 2013.
[163] Deng, L.; Ma, J.: Spontaneous speech recognition using a statistical coarticulatory model for the vocal tract resonance dynamics. J. Acoust. Soc. Am., 108 (2000), 30363048.
[164] Togneri, R.; Deng, L.: Joint state and parameter estimation for a target-directed nonlinear dynamic system model. IEEE Trans. Signal Process., 51 (12) (2003), 30613070.
[165] Mohamed, A.; Dahl, G.; Hinton, G.: Deep belief networks for phone recognition, in Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications, 2009.
[166] Sivaram, G.; Hermansky, H.: Sparse multilayer perceptron for phoneme recognition. IEEE Trans. Audio Speech Lang. Process., 20 (1) (2012), 2329.
[167] Kubo, Y.; Hori, T.; Nakamura, A.: Integrating deep neural networks into structural classification approach based on weight finite-state transducers, in Proc. Interspeech, 2012.
[168] Deng, L. et al. : Recent advances in deep learning for speech research at Microsoft, in Proc. ICASSP, 2013.
[169] Deng, L.; Hinton, G.; Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview, in Proc. ICASSP, 2013.
[170] Juang, B.; Levinson, S.; Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. Inf. Theory, 32 (1986), 307309.
[171] Deng, L.; Lennig, M.; Seitz, F.; Mermelstein, P.: Large vocabulary word recognition using context-dependent allophonic hidden Markov models. Comput. Speech Lang., 4 (4) (1990), 345357.
[172] Deng, L.; Kenny, P.; Lennig, M.; Gupta, V.; Seitz, F.; Mermelstein, P.: Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Signal Process, 39 (7) (1991), 16771681.
[173] Sheikhzadeh, H.; Deng, L.: Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization. IEEE Trans. Speech Audio Process., 2 (1994), 8091.
[174] Jaitly, N.; Hinton, G.: Learning a better representation of speech sound waves using restricted Boltzmann machines, in Proc. ICASSP, 2011.
[175] Sainath, T.; Kingbury, B.; Ramabhadran, B.; Novak, P.; Mohamed, A.: Making deep belief networks effective for large vocabulary continuous speech recognition, in Proc. IEEE ASRU, 2011.
[176] Jaitly, N.; Nguyen, P.; Vanhoucke, V.: Application of pre-trained deep neural networks to large vocabulary speech recognition, in Proc. Interspeech, 2012.
[177] Hinton, G.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580v1, 2012.
[178] Yu, D.; Deng, L.; Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, in Proc. NIPS Workshop, 2010.
[179] Yu, D.; Li, J.-Y.; Deng, L.: Calibration of confidence measures in speech recognition. IEEE Trans. Audio Speech Lang., 19 (2010), 24612473.
[180] Maas, A.; Le, Q.; O'Neil, R.; Vinyals, O.; Nguyen, P.; Ng, Y.: Recurrent neural networks for noise reduction in robust ASR, in Proc. Interspeech, 2012.
[181] Ling, Z.; Deng, L.; Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process., 21 (10) (2013), 21292139.
[182] Kang, S.; Qian, X.; Meng, H.: Multi-distribution deep belief network for speech synthesis, in Proc. ICASSP, 2013, 80128016.
[183] Zen, H.; Senior, A.; Schuster, M.: Statistical parametric speech synthesis using deep neural networks, in Proc. ICASSP, 2013, 79627966.
[184] Fernandez, R.; Rendel, A.; Ramabhadran, B.; Hoory, R.: F0 contour prediction with a deep belief network-Gaussian process hybrid Model, in Proc. ICASSP, 2013, 68856889.
[185] Humphrey, E.; Bello, J.; LeCun, Y.: Moving beyond feature design: deep architectures and automatic feature learning in music informatics, in Proc. ISMIR, 2012.
[186] Batternberg, E.; Wessel, D.: Analyzing drum patterns using conditional deep belief networks, in Proc. ISMIR, 2012.
[187] Schmidt, E.; Kim, Y.: Learning emotion-based acoustic features with deep belief networks, in Proc. IEEE Applications of Signal Processing to Audio and Acoustics, 2011.
[188] Nair, V.; Hinton, G.: 3-d object recognition with deep belief nets, in Proc. NIPS, 2009.
[189] LeCun, Y.; Bengio, Y.: Convolutional networks for images, speech, and time series, in The Handbook of Brain Theory and Neural Networks Arbib, M. A., ed.), 255258, MIT Press, Cambridge, Massachusetts, 1995.
[190] Kavukcuoglu, K.; Sermanet, P.; Boureau, Y.; Gregor, K.; Mathieu, M.; LeCun, Y.: Learning convolutional feature hierarchies for visual recognition, in Proc. NIPS, 2010.
[191] Zeiler, M.; Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks, in Proc. ICLR, 2013.
[192] LeCun, Y.: Learning invariant feature hierarchies, in Proc. ECCV, 2012.
[193] Coates, A.; Huval, B.; Wang, T.; Wu, D.; Ng, A.; Catanzaro, B.: Deep learning with COTS HPC, in Proc. ICML, 2013.
[194] Papandreou, G.; Katsamanis, A.; Pitsikalis, V.; Maragos, P.: Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition. IEEE Trans. Audio Speech Lang. Process., 17 (3) (2009), 423435.
[195] Deng, L.; Wu, J.; Droppo, J.; Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech Audio Process., 13 (3) (2005), 412421.
[196] Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C.: A neural probabilistic language model, in Proc. NIPS, 2000, 933938.
[197] Zamora-Martínez, F.; Castro-Bleda, M.; España-Boquera, S.: Fast evaluation of connectionist language models, in Int. Conf. Artificial Neural Networks, 2009, 144–151.
[198] Mnih, A.; Hinton, G.: Three new graphical models for statistical language modeling, in Proc. ICML, 2007, 641648.
[199] Mnih, A.; Hinton, G.: A scalable hierarchical distributed language model, in Proc. NIPS, 2008, 10811088.
[200] Le, H.; Allauzen, A.; Wisniewski, G.; Yvon, F.: Training continuous space language models: some practical issues, in Proc.EMNLP, 2010, 778788.
[201] Le, H.; Oparin, I.; Allauzen, A.; Gauvain, J.; Yvon, F.: Structured output layer neural network language model, in Proc. ICASSP, 2011.
[202] Mikolov, T.; Deoras, A.; Povey, D.; Burget, L.; Cernocky, J.: Strategies for training large scale neural network language models, in Proc. IEEE ASRU, 2011.
[203] Mikolov, T.: Statistical Language Models based on Neural Networks. Ph.D. thesis, Brno University of Technology, 2012.
[204] Huang, S.; Renals, S.: Hierarchical Bayesian language models for conversational speech recognition. IEEE Trans. Audio Speech Lang. Process., 18 (8) (2010), 19411954.
[205] Collobert, R.; Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning, in Proc. ICML, 2008.
[206] Collobert, R.: Deep learning for efficient discriminative parsing, in Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2010.
[207] Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12 (2011), 24932537.
[208] Socher, R.; Bengio, Y.; Manning, C.: Deep learning for NLP. Tutorial at ACL, 2012, http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial.
[209] Huang, E.; Socher, R.; Manning, C.; Ng, A.: Improving word representations via global context and multiple word prototypes, in Proc. ACL, 2012.
[210] Zou, W.; Socher, R.; Cer, D.; Manning, C.: Bilingual word embeddings for phrase-based machine translation, in Proc. EMNLP, 2013.
[211] Gao, J.; He, X.; Yih, W.; Deng, L.: Learning semantic representations for the phrase translation model. MSR-TR-2013–88, September 2013.
[212] Socher, R.; Pennington, J.; Huang, E.; Ng, A.; Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions, in Proc. EMNLP, 2011.
[213] Socher, R.; Pennington, J.; Huang, E.; Ng, A.; Manning, C.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection, in Proc. NIPS, 2011.
[214] Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.; Ng, A.; Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank, in Proc. EMNLP, 2013.
[215] Yu, D.; Deng, L.; Seide, F.: The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 21 (2013), 388396.
[216] Salakhutdinov, R.; Hinton, G.: Semantic hashing, in Proc. SIGIR Workshop on Information Retrieval and Applications of Graphical Models, 2007.
[217] Hinton, G.; Salakhutdinov, R.: Discovering binary codes for documents by learning deep generative models. Top. Cognit. Sci., (2010), 118.
[218] Huang, P.; He, X.; Gao, J.; Deng, L.; Acero, A.; Heck, L.: Learning deep structured semantic models for web search using clickthrough data, in ACM Int. Conf. Information and Knowledge Management (CIKM), 2013.
[219] Le, Q.; Ngiam, J.; Coates, A.; Lahiri, A.; Prochnow, B.; Ng, A.: On optimization methods for deep learning, in Proc. ICML, 2011.
[220] Bottou, L.; LeCun, Y.: Large scale online learning, in Proc. NIPS, 2004.
[221] Bergstra, J.; Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res., 3 (2012), 281305.
[222] Snoek, J.; Larochelle, H.; Adams, R.: Practical Bayesian optimization of machine learning algorithms, in Proc. NIPS, 2012.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords:

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 676
Total number of PDF views: 8976 *
Loading metrics...

Abstract views

Total abstract views: 5136 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th June 2018. This data will be updated every 24 hours.