Hostname: page-component-6766d58669-vgfm9 Total loading time: 0 Render date: 2026-05-16T09:14:32.248Z Has data issue: false hasContentIssue false

The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

Published online by Cambridge University Press:  23 July 2018

Kar-Han Tan*
Affiliation:
NCS. 5 Ang Mo Kio Street 62, Singapore 569141
Boon Pang Lim
Affiliation:
NovuMind. 5201 Great America Parkway, suite 138, Santa Clara, California, 95054, USA
*
Corresponding author: Kar-Han Tan Email: karhan.tan@ieee.org

Abstract

In this paper we look at recent advances in artificial intelligence. Decades in the making, a confluence of several factors in the past few years has culminated in a string of breakthroughs in many longstanding research challenges. A number of problems that were considered too challenging just a few years ago can now be solved convincingly by deep neural networks. Although deep learning appears to be reducing the algorithmic problem solving to a matter of data collection and labeling, we believe that many insights learned from ‘pre-Deep Learning’ works still apply and will be more valuable than ever in guiding the design of novel neural network architectures.

Information

Type
Industrial Technology Advances
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2018
Figure 0

Fig. 1. The ability to create intelligent beings has long been a subject of endless fascination. With deep learning and the latest computers, we are coming close to achieving the dream of AI. (Image: ‘Homunculus in the vial’ by Franz Xaver Simm, 1899).

Figure 1

Fig. 2. A Convolution layer uses a shared set of weights for a set of neurons, effectively applying the same neuron across the input in a sliding window fashion.

Figure 2

Fig. 3. The VGG-19 network [17], which uses only five types of layers. All convolution layers use 3×3 kernels and all max pool layers use 2×2 kernels. 19 counts only the convolution and fully connected layers. Due to the presence of fully connected layers, VGG-19 is not fully convolutional and therefore, will only be able to accept 224×224×3 inputs without retraining.

Figure 3

Fig. 4. How to train a DNN with SGD and backpropagation.

Figure 4

Fig. 5. Direct connections enables training of deeper models.

Figure 5

Fig. 6. Recurrent connnections in a single neuron recurrent neural network (from [25]).

Figure 6

Fig. 7. Unrolling recurrent links through time with the BPTT algorithm (from [25]).

Figure 7

Fig. 8. An LSTM cell helps to enforce constant error flow in BPTT. (from [30]).

Figure 8

Fig. 9. A taxonomy of hardware platforms for AI.

Figure 9

Fig. 10. Object recognition: A VGG network running on NovuMind's AI processor FPGA prototype performs real time object recognition. Upper left-hand corner lists the top-5 recognition results.

Figure 10

Fig. 11. Face recognition on NovuMind's AI processor FPGA prototype running in real time.

Figure 11

Fig. 12. NovuMind Unconstrained Face recognition system identifies employees and visitors in real time.

Figure 12

Fig. 13. Object detection with convolutional neural networks. A network trained to detect and recognize traffic signs may have applications in autonomous driving.

Figure 13

Fig. 14. Automatic colorization with a DNN [52]. (a) Input monochrome image. (b) Output colorized image.

Figure 14

Fig. 15. Monocular depth estimation with a DNN. (Top) Input RGB image. (Bottom) Output depth map in grayscale and pseudo color.

Figure 15

Fig. 16. Comparison of n-gram and RNN-based language models.[105].

Figure 16

Fig. 17. Block diagram for text to speech synthesis systems. (from [111]).

Figure 17

Fig. 18. Causal dilated CNNs used in WaveNet [116].

Figure 18

Fig. 19. Neural network equivalents for the acoustic, duration, and pitch models in the DeepVoice systems (from [120]).