Skip to main content Accessibility help

The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

  • Kar-Han Tan (a1) and Boon Pang Lim (a2)


In this paper we look at recent advances in artificial intelligence. Decades in the making, a confluence of several factors in the past few years has culminated in a string of breakthroughs in many longstanding research challenges. A number of problems that were considered too challenging just a few years ago can now be solved convincingly by deep neural networks. Although deep learning appears to be reducing the algorithmic problem solving to a matter of data collection and labeling, we believe that many insights learned from ‘pre-Deep Learning’ works still apply and will be more valuable than ever in guiding the design of novel neural network architectures.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence
      Available formats


This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

Corresponding author

Corresponding author: Kar-Han Tan Email:


Hide All
[1]Cellan-Jones, R.: Stephen hawking - will AI kill or save humankind?, October 2016,
[2]Dowd, M.: Elon Musk's billion-dollar crusade to stop the a.i. apocalypse. Vanity Fair, March 2017,
[3]Bostrom, N.: Superintelligence: Paths, Dangers, Strategies, Oxford University Press, Oxford, UK, 2016, Excerpt
[4]Harari, Y.N.: Homo Deus: A Brief History of Tomorrow, Harper, 2017, Excerpt:
[5]McCarthy, J.; Minsky, M.; Rochester, N.; Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence. August 1955,
[6]McCulloch, W.S.; Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys., 5 (4) (1943), 115133.
[7]Davis, K.H.; Biddulph, R.; Balashek, S.: Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 ( 1952), 637.
[8]Papert, S.A.: The summer vision project. MIT AI Memos, July 1966.
[9]Esteva, A., et al. : Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542 (7639) (2017), 115118,
[10]Silver, D., et al. : Mastering the game of go with deep neural networks and tree search. Nature, 529 (2016), 484489.
[11]NTSB.: NTSB docket HWY16FH018. June 2017,
[12]Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: Imagenet classification with deep convolutional neural networks. in Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012, 10971105,
[13]Russakovsky, O., et al. : ImageNet large scale visual recognition challenge. Int. J. Computer Vision (IJCV), 115 (3) (2015), 211252.
[14]Goodfellow, I.; Bengio, Y.; Courville, A.: Deep Learning. MIT Press, Cambridge, MA, 2016,
[15]Rosenblatt, F.: Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Cornell Aeronautical Laboratory Report, no. 1196-G-8, March 1961.
[16]LeCun, Y., et al. : Handwritten digit recognition with a back-propagation network. in Advances in Neural Information Processing Systems (NIPS 1989). Touretzky, D. Ed., vol. 2. Denver, CO, Morgan Kaufman, 1990,
[17]Simonyan, K.; Zisserman, A.: Very deep convolutional networks for large-scale image recognition. in Int. Conf. on Learning Representations, 2015,
[18]Burt, P.; Adelson, E.: The laplacian pyramid as a compact image code. IEEE Trans. Commun., 31 (4) (1983), 532540.
[19]Rumelhart, D.E.; Hinton, G.E.; Williams, R.J.: Learning representations by back-propagating errors. Nature, 323 (1986), 533536,
[20]Ioffe, S.; Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. in The 32nd Int. Conf. on Machine Learning, Lille, France, 2015.
[21]He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition. in IEEE Computer Vision and Pattern Recognition (CVPR), 2016,
[22]Huang, G.; Liu, Z.; van der Maaten, L.: Weinberger, K.Q.: Densely connected convolutional networks. in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2017,
[23]Lau, R.; Rosenfeld, R.; Roukos, S.: Trigger-based language models: a maximum entropy approach. in ICASSP, vol. 2, 1993, 4558.
[24]Oppenheim, A.V.; Schafer, R.W.: Discrete-Time Signal Processing, 3rd ed., Prentice Hall Press, Upper Saddle River, NJ, USA, 2009.
[25]Lipton, Z.C.: A critical review of recurrent neural networks for sequence learning. CoRR, vol. abs/1506.00019, 2015.
[26]Bengio, Y.: A connectionist approach to speech recognition. Int. J. Pattern. Recognit. Artif. Intell., 7 (1993), 657.
[27]Rumelhart, D.E.; Hinton, G.E.; Williams, R.J.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. Rumelhart, D. E.; McClelland, J. L.; C. PDP Research Group, Eds. Cambridge, MA, USA, MIT Press, 1986, ch. Learning Internal Representations by Error Propagation, 318–362. [Online]. Available:
[28]Bengio, Y.; SImard, P.; Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. no. 2, March 1994.
[29]Hochreiter, S.; Schmidhuber, J.: Long short-term memory. Neural Comptuation, 9 (1997), 17351780.
[30]Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J.: LSTM: A search space odyssey. CoRR, vol. abs/1503.04069, 2015.
[31]Amodei, D., et al. : Deep speech 2: End-to-end speech recognition in english and mandarin. Arxiv preprint, 2017.
[32]Graves, A.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. in Int. Conf. in Machine Learning, 2006.
[33]Xiong, W., et al. : Achieving human parity in conversational speech recognition. Microsoft Technical Report, 2017.
[34]Wang, Y.-H.; Chung, G.-T.; Lee, H.-Y.: Gate activation signal analysis for gated recurrent neural networks and its correlation with phoneme boundaries. in INTERSPEECH, 2017.
[35]Pharr, M. ; Fernando, R. Eds.: GPU gems 2: programming tech- niques for high-performance graphics and general-purpose computation. Addison-Wesley, 2005,
[36]Buck, I., et al. : Brook for gpus: stream computing on graphics hardware. in ACM Trans. Graphics (TOG), vol. 23(3), ACM, 2004, 777786.
[37]Nickolls, J.; Buck, I.; Garland, M.; Skadron, K.: Scalable parallel programming with cuda. Queue, 6 (2) (2008), 4053, [Online]. Available:
[38]Khronos Group.: OpenCL.
[39]Jouppi, N.P., et al. : In-Datacenter Performance Analysis of a Tensor Processing Unit. in 44th Int. Symp. on Computer Architecture (ISCA), June 2017,
[40]Michael, Calore.: DJI's new palm-sized drone responds to a wave of your hand. WIRED, May 2017,
[41]Lu, C.-P.: AI, Native Supercomputing and The Revival of Moore's Law. ArXiv e-prints, May 2017, accepted for publication in APSIPA Transactions on Signal and Information Processing
[42]Rosenthal, E.: Using keras’ pretrained neural networks for visual similarity recommendations.
[43]Chen, J.C.; Ranjan, R.; Kumar, A.; Chen, C.H.; Patel, V.M.; Chellappa, R.: An end-to-end system for unconstrained face verification with deep convolutional neural networks. in 2015 IEEE Int. Conf. on Computer Vision Workshop (ICCVW), December 2015, 360368,
[44]Schroff, F.; Kalenichenko, D.; Philbin, J.: Facenet: A unified embedding for face recognition and clustering. in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2015,
[45]Redmon, J.; Farhadi, A.: Yolo9000: Better, faster, stronger. in IEEE Computer Vision and Pattern Recognition (CVPR), July 2017,
[46]Ren, S.; He, K.; Girshick, R.; Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern. Anal. Mach. Intell., 39 (6) (2017), 11371149.
[47]Cao, Z.; Simon, T.; Wei, S.-E.; Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. in IEEE Computer Vision and Pattern Recognition (CVPR), 2017,
[48]Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. in IEEE Computer Vision and Pattern Recognition (CVPR), 2014,
[49]Li, Y.; Qi, H.; Dai, J.; Ji, X.; Wei, Y.: Fully convolutional instance-aware semantic segmentation. 2017,
[50]He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.: Mask r-cnn. in IEEE Int. Conf. on Computer Vision (ICCV), 2017.
[51]Shen, X., et al. : Automatic portrait segmentation for image stylization. in Computer Graphics Forum, 35(2)(Proc. Eurographics), 2016,
[52]Zhang, R.; Isola, P.; Efros, A.A.: Colorful image colorization. in ECCV, 2016,
[53]Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N.: Deeper depth prediction with fully convolutional residual networks. in 3D Vision (3DV), 2016 Fourth Int. Conf. on. IEEE, 2016, 239248,
[54]Yang, Q.; Tan, K.-H.; Culbertson, B.; Apostolopoulos, J.: Fusion of active and passive sensors for fast 3d capture. in 2010 IEEE Int. Workshop on Multimedia Signal Processing, October 2010, 6974.
[55]Yang, Q., et al. : Fusion of median and bilateral filtering for range image upsampling. IEEE Trans. Image. Process., 22 (12) (2013), 48414852.
[56]Tateno, K.; Tombari, F.; Laina, I.; Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. in IEEE Computer Vision and Pattern Recognition (CVPR), June 2017,
[57]Gatys, L.A.; Ecker, A.S.; Bethge, M.: Image style transfer using convolutional neural networks. in IEEE Conf. on Computer Vision and Pattern Recognition, June 2016,
[58]Johnson, J.; Alahi, A.; Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. in European Conf. on Computer Vision, 2016,
[59]Goodfellow, I.J., et al. : Generative Adversarial Networks. ArXiv e-prints, June 2014,
[60]Arjovsky, M.; Chintala, S.; Bottou, L.: Wasserstein GAN. ArXiv e-prints, January 2017,
[61]Yun, S.; Choi, J.; Yoo, Y.; Yun, K.; Choi, J.Y.: Action-decision network for visual tracking with deep reinforcement learning. in IEEE Computer Vision and Pattern Recognition (CVPR), 2017,
[62]Li, J.; Monroe, W.; Ritter, A.; Jurafsky, D.; Galley, M.; Gao, J.: Deep reinforcement learning for dialogue generation. in Empirical Methods in Natural Language Processing, 2016.
[63]Smolyanskiy, N.; Kamenev, A.; Smith, J.; Birchfield, S.: Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness, 2017,
[64]Mao, J.; Jonathan, H.; Toshev, A.; Camburu, O.; Yuille, A.; Murphy, K.: Generation and comprehension of unambiguous object descriptions. in IEEE Computer Vision and Pattern Recognition (CVPR), 2016,
[65]Xu, K., et al. : Show, attend and tell: Neural image caption generation with visual attention, in Bach, F. Blei, D., Eds. Proc. of the 32nd Int. Conf. on Machine Learning, ser. Proc. of Machine Learning Research, vol. 37. Lille, France, PMLR, 07–09 July 2015, 20482057,
[66]Zhang, H., et al. : Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv:1612.03242, 2016.
[67]Lippman, R.P.: Speech recognition by machines and humans. vol. 22, 1997.
[68]Juneja, A.: A comparison of automatic and human speech recognition in null grammar. J. Acoust. Soc. Am., 131 (2012), EL256EL261.
[69]Xiong, W., et al. : Achieving human parity in conversational speech recognition. 2017.
[70]Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K.J.: Phoneme recognition using time-delay neural networks. in IEEE Transactions on Acoustics Speech and Signal Processing, 1989.
[71]Dahl, G.E.; Yu, D.; Deng, L.; Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio. Speech. Lang. Process., 20 (2012), 3042.
[72]Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. in ICASSP, 2015.
[73]Wu, Z.; Watts, O.; King, S.: Merlin: An open source neural network speech synthesis system. 2016. [Online]. Available:
[74]van den Oord, A. et al. : Wavenet: A generative model for raw audio. CoRR, vol. abs/1609.03499, 2016. [Online]. Available:
[75]Huang, P.-S.; Kim, M.; Hasegawa-Johnson, M.; Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 23 (12) (2015), 21362147, [Online]. Available:
[76]Vaswani, A., et al. : Attention is all you need. 2017, [Online]. Available:
[77]Vinyals, O.; Le, Q.: in Int. Conf. on Machine Learning, 2015.
[78]Ostendorf, M.: Moving beyond the ‘beads-on-a-string’ model of speech, in In Proc. IEEE ASRU Workshop, 1999, 7984.
[79]Baker, J.K.: The dragon system – an overview. IEEE Trans. Acoustics Speech Signal Process., 23 (1973), 2429.
[80]Lee, K.-F.: Readings in speech recognition: Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition. Waibel, A.; Lee, K.-F.: Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1990, 347366. [Online]. Available:
[81]Niyogi, P.; Jensen, A.: Point process models for event-based speech recognition. TASLP, 2009. [Online]. Available:
[82]Zweig, G.: Speech recognition with dynamic bayesian networks. Ph.D. dissertation, 1988.
[83]Lee, C.-H., et al. : An overview on automatic speech attriute transcription (asat). in INTERSPEECH, 2007.
[84]Hasegawa-Johnson, M., et al. : Landmark-based speech recognition: Report of the 2004 johns hopkins summer workshop. in Acoustics, Speech, and Signal Processing, 2005. Proc. (ICASSP’05). IEEE Int. Conf. on, vol. 1. IEEE, 2005, I213.
[85]Yajie Miao, F.M.; Gowayyed, M.: Eesen: End-to-end speech recognition using deep rnn models and wfst-based decoding. 2015.
[86]Chan, W.; Jaitly, N.; Le, Q.V.; Vinyals, O.: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. in ICASSP, 2016.
[87]Povey, D.; Woodland, P.: Minimum phone error and i-smoothing for improved discriminative training. in ICASSP, 2002.
[88]Vesely, K.; Ghoshal, A.; Burget, L.; Povey, D.: Sequence-discriminative training of deep neural networks. 2013.
[89]Gales, M.J.F.; Woodland, P.: Mean and variance adaptation within the mllr framework. Comput. Speech Lang., 10 (1996), 249264.
[90]Gauvain, J.; Lee, C.-H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing, 2(2) 291298.
[91]Roland Kuhn, P.N.; Junqua, J.-C.; Niedzielski, N.: Rapid speaker adaptation in eigenvoice space,. in IEEE Transactions on Speech and Audio Processing, vol. 8 (6), November 2000, 695707.
[92]Swietojanski, P.; Renals, S.: Learning hidden unit contributions. March 2016.
[93]Saon, G.; Soltau, H.; Nahamoo, D.; Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. in ASRU, 2013.
[94]Vesely, K.; Watanabe, S.; Zmolikova, K.; Karafiat, M.; Burbget, L.; Cernocky, J.H.: Sequence summarizing neural network for speaker adaptation, 2016.
[95]Anastasakos, T.; McDonough, J.W.; Schwartz, R.M.; Makhoul, J.: A compact model for speaker-adaptive training. in ICSLP. ISCA, 1996.
[96]Povey, D., et al. : Subspace gaussian mixture models for speech recognition. in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE Int. Conf. on. IEEE, 2010, 43304333.
[97]Povey, D.; Saon, G.: Feature and model space speaker adaptation with full covariance gaussians. 2006.
[98]Cui, X.; Goel, V.; Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. 2015.
[99]Verma, P.; Schafer, R.W.: Frequency estimation from waveform using multi-layered neural-networks. in INTERSPEECH 2016, 2016.
[100]Sainath, T.N.; Weiss, R.J.; Senior, A.; Wilson, K.W.; Vinyals, O.: Learning the speech front-end with raw waveform cldnns. 2015.
[101]Schrank, T.; Pfeifenberger, L.; Zöhrer, M.; Stahl, J.; Mowlaee, P.; Pernkopf, F.: Deep beamforming and data augmentation for robust speech recognition: Results of the 4th chime challenge, 2014. [Online]. Available:
[102]Xiao, X., et al. : Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP J. Adv. Signal Process., 2016 (1), ( 2016), 118.
[103]Biadsy, F.; Ghodsi, M.; Caseiro, D.: Effectively building tera scale maxent language models incorporating non-linguistic signals. in INTERSPEECH, 2017.
[104]Luonog, T.; Kayser, M.; Manning, C.D.: Deep neural language models for machine translation. in Computational Natural Language Learning, July 2015.
[105]Chelba, C., et al. : One billion word benchmark for measuring progress in statistical language modeling. Google, Tech. Rep., 2013. [Online]. Available:
[106]Jozefowicz, R.; Vinyals, O.; Schuster, M.; Shazeer, N.; Wu, Y.: Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
[107]Kim, Y.; Jernite, Y.; Sontag, D.; Rush, A.M.: Character-aware neural language models. in Thirtieth AAAI Conf. on Artificial Intelligence, 2016.
[108]Kuchaiev, O.; Ginsburg, B.: Factorization tricks for lstm networks. in Int. Conf. on Learning Representations.
[109]Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D.: 2016. [Online]. Available
[110]Shazeer, N.; Pelemans, J.; Chelba, C.: Spares non-negative matrix language modeling for skip-grams. in INTERSPEECH, 2015.
[111]Schroeter, J.: Basic Principles of Speech Synthesis. Berlin, Heidelberg, Springer Berlin Heidelberg, 2008, 413428. [Online]. Available:
[112]Ling, Z-H.; Richmond, K.; Yamagishi, J.: Articulatory control of hmm-based parametric speech synthesis using feature-spaced-switched multiple regression. IEEE Trans. Audio Speech Language Process., 21 (1) (2013), 207219.
[113]Klatt, D.H.: Software for a cascade/parallel formant synthesizer. J. Acoust. Soc. Am., 67 (3) (1980), 971995.
[114]Hunt, A.J.; Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. in ICASSP, 1996.
[115]Black, A.W.; Zen, H.; Tokuda, K.: Statistical parametric speech synthesis, in in Proc. ICASSP, 2007, 2007, 12291232.
[116]van den Oord, A.; Deileman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukvouglu, K.: Wavenet: A generative model for raw audio. CoRR, 2016.
[117]Sejnowski, T.; Rosenberg, C.: Nettalk: a parallel network that learns to read aloud. John Hopkins University, Tech. Rep., 1986.
[118]Paine, T.L., et al. : Fast wavenet generation algorithm. arXiv preprint arXiv:1611.09482, 2016.
[119]Arik, S.Ö, et al. : Deep voice: Real-time neural text-to-speech. CoRR, vol. abs/1702.07825, 2018.
[120]Arik, S.Ö, et al. : Deep voice 2: Multi-speaker neural text-to-speech. CoRR, vol. abs/1705.08947, 2017.
[121]Wang, Y., et al. : Tacotron: A fully end-to-end text-to-speech synthesis model. CoRR, vol. abs/1703.10135, 2017.
[122]Sotelo, J., et al. : Char2wav: End-to-end speech synthesis. in ICLR, 2017.
[123]Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O.: Understanding deep learning requires rethinking generalization. ArXiv e-prints, in 5th Int. Conf. on Learning Representations (ICLR 2017), November 2016.
[124]Tishby, N.; Zaslavsky, N.: Deep learning and the information bottleneck principle. in IEEE Information Theory Workshop (ITW), 2015,
[125]Vaswani, A., et al. : Attention Is All You Need. ArXiv e-prints, June 2017,
[126]Athalye, A.: Robust adverserial examples.
[127]Sproat, R.; Jaitly, N.: Rnn approaches to text normalization: A challenge, 2016.


The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

  • Kar-Han Tan (a1) and Boon Pang Lim (a2)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed