Large-Scale FPGA-Based Convolutional Networks

doi:10.1017/CBO9781139042918.020

19 - Large-Scale FPGA-Based Convolutional Networks

from Part Four - Applications

Published online by Cambridge University Press: 05 February 2012

Yann Lecun ,

Selcuk Talay and

Edited by

Mikhail Bilenko and

Clément Farabet: Affiliation:
New York University
Yann Lecun: Affiliation:
New York University
Koray Kavukcuoglu: Affiliation:
NEC Labs America, Princeton, NJ, USA
Berin Martini: Affiliation:
Yale University
Polina Akselrod: Affiliation:
Yale University
Selcuk Talay: Affiliation:
Yale University
Eugenio Culurciello: Affiliation:
Yale University
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene.

Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008).

Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

Information

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 399 - 419

DOI: https://doi.org/10.1017/CBO9781139042918.020 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Adams, D. A. 1969. A Computation Model with Data Flow Sequencing. Ph.D. thesis, Stanford University.

Ahmed, A., Yu, K., Xu, W., Gong, Y., and Xing, E. 2008. Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks. In: ECCV. New York: Springer.Google Scholar

Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. 2007. Greedy Layer-Wise Training of Deep Networks. In: NIPS.Google Scholar

Berg, A. C., Berg, T. L., and Malik, J. 2005. Shape Matching and Object Recognition Using Low Distortion Correspondences. In: CVPR.Google Scholar

Chellapilla, K., Shilman, M., and Simard, P. 2006. Optimally Combining a Cascade of Classifiers. In: Proceedings of Document Recognition and Retrieval 13, Electronic Imaging, 6067.CrossRef Google Scholar

Cho, M. H., ChengC,-C. C,-C., Kinsy, M., Suh, G. E., and Devadas, S. 2008. Diastolic Arrays: Throughput-Driven Reconfigurable Computing.Google Scholar

Coates, A., Baumstarck, P., Le, Q., and Ng, A.Y. 2009. Scalable Learning for Object Detection with GPU Hardware. Pages 4287–4293 of: Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. Citeseer.Google Scholar

Collobert, R. 2008. Torch. Presented at the Workshop on Machine Learning Open Source Software, NIPS.

Dalal, N., and Triggs, B. 2005. Histograms of Oriented Gradients for Human Detection. In: CVPR.Google Scholar

Delakis, M., and Garcia, C. 2008. Text Detection with Convolutional Neural Networks. In: International Conference on Computer Vision Theory and Applications (VISAPP 2008).Google Scholar

Dennis, J. B., and Misunas, D. P. 1974. A Preliminary Architecture for a Basic Data-Flow Processor. SIGARCH Computer Architecture News, 3(4), 126–132.CrossRef Google Scholar

Farabet, C., Poulet, C., Han, J. Y., and LeCun, Y. 2009. CNP: An FPGA-Based Processor for Convolutional Networks. In: International Conference on Field Programmable Logic and Applications (FPL'09). Prague: IEEE.Google Scholar

Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., and Culurciello, E. 2010. Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems. In: International Symposium on Circuits and Systems (ISCAS'10). Paris: IEEE.Google Scholar

Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., and Vincent, L. 2009. Large-Scale Privacy Protection in Street-Level Imagery. In: ICCV'09.Google Scholar

Fukushima, K., and Miyake, S. 1982. Neocognitron: A New Algorithm for Pattern Recognition Tolerant of Deformations and Shifts in Position. Pattern Recognition, 15(6), 455–469.CrossRef Google Scholar

Garcia, C., and Delakis, M. 2004. Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.CrossRef Google Scholar PubMed

Hadsell, R., Sermanet, P., Scoffier, M., Erkan, A., Kavackuoglu, K., Muller, U., and LeCun, Y. 2009. Learning Long-Range Vision for Autonomous Off-Road Driving. Journal of Field Robotics, 26(2), 120–144.CrossRef Google Scholar

Hicks, J., Chiou, D., Ang, B. S., and Arvind, . 1993. Performance Studies of Id on the Monsoon Dataflow System.

Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the Dimensionality of Data with Neural Networks. Science.CrossRef Google Scholar PubMed

Huang, F.-J., and LeCun, Y. 2006. Large-Scale Learning with SVM and Convolutional Nets for Generic Object Categorization. In: Proceedings of Computer Vision and Pattern Recognition Conference (CVPR'06). IEEE.Google Scholar

Jain, V., and Seung, H. S. 2008. Natural Image Denoising with Convolutional Networks. In: Advances in Neural Information Processing Systems 21 (NIPS 2008). Cambridge, MA: MIT Press.Google Scholar

Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., and LeCun, Y. 2009. What Is the Best Multi-Stage Architecture for Object Recognition? In: Proceedings of International Conference on Computer Vision (ICCV'09). IEEE.Google Scholar

Kavukcuoglu, K., Ranzato, M. A., and LeCun, Y. 2008. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. Technical Report CBLL-TR-2008-12-01.

Kavukcuoglu, K., Ranzato, M. A., Fergus, R., and LeCun, Y. 2009. Learning Invariant Features through Topographic FilterMaps. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE.Google Scholar

Kung, H. T. 1986. Why Systolic Architectures? 300–309.

Gaudiot, J. L., Bic, L., Dennis, J., and Dennis, J. B. 1994. Stream Data Types for Signal Processing. In: In Advances in Dataflow Architecture and Multithreading. IEEE.Google Scholar

Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Pages 2169–2178 of: Proceedings of Computer Vision and Pattern Recognition. IEEE.Google Scholar

LeCun, Y., and Bottou, L. 2002. Lush Reference Manual. Technical Report Code available at http://lush.sourceforge.net.

LeCun, Y., and Cortes, C. 1998. MNIST Dataset. http://yann.lecun.com/exdb/mnist/.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation.CrossRef Google Scholar

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. 1990. Handwritten Digit Recognition with a Back-Propagation Network. In: NIPS'89.Google Scholar

LeCun, Y., Bottou, L., Orr, G., and Muller, K. 1998a. Efficient BackProp. In: Orr, G., and Muller, K., (eds), Neural Networks: Tricks of the Trade. New York: Springer.Google Scholar

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998b. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef Google Scholar

LeCun, Y., Huang, F.-J., and Bottou, L. 2004. Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: Proceedings of CVPR'04. IEEE.Google Scholar

Lee, E. A., and David, . 1987. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Transactions on Computers, 36, 24–35.CrossRef Google Scholar

Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. 2009. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In: Proceedings of the 26th International Conference on Machine Learning (ICML'09).CrossRef Google Scholar

Lowe, D. G. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision.CrossRef Google Scholar

Lyu, S., and Simoncelli, E. P. 2008. Nonlinear Image Representation Using Divisive Normalization. In: CVPR.Google Scholar PubMed

Mozer, M. C. 1991. The Perception of Multiple Objects: A Connectionist Approach. Cambridge, MA: MIT Press.Google Scholar

Mutch, J., and Lowe, D. G. 2006. Multiclass Object Recognition with Sparse, Localized Features. In: CVPR.Google Scholar

Nasse, F., Thurau, C., and Fink, G. A. 2009. Face Detection Using GPU-Based Convolutional Neural Networks.

Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., and Barbano, P. 2005. Toward Automatic Phenotyping of Developing Embryos from Videos. IEEE Transactions on Image Processing. Special issue on Molecular and Cellular Bioimaging.Google Scholar PubMed

Nowlan, S., and Platt, J. 1995. A Convolutional Neural Network Hand Tracker. Pages 901–908 of: Neural Information Processing Systems. San Mateo, CA: Morgan Kaufmann.Google Scholar

Olshausen, B. A., and Field, D. J. 1997. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?Vision Research.CrossRef Google Scholar PubMed

Osadchy, M., LeCun, Y., and Miller, M. 2007. Synergistic Face Detection and Pose Estimation with Energy-Based Models. Journal of Machine Learning Research, 8(May), 1197–1215.Google Scholar

Pinto, N., Cox, D. D., and DiCarlo, J. J. 2008. Why Is Real-World Visual Object Recognition Hard?PLoS Computer Biology, 4(1), e27.CrossRef Google Scholar PubMed

Ranzato, M. A., Boureau, Y.-L., and LeCun, Y. 2007a. Sparse Feature Learning for Deep Belief Networks. In: NIPS'07.Google Scholar

Ranzato, M. A., Huang, F.-J., Boureau, Y.-L., and LeCun, Y. 2007b. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. In: Proceedings of Computer Vision and Pattern Recognition Conference (CVPR'07). IEEE.Google Scholar

Serre, T., Wolf, L., and Poggio, T. 2005. Object Recognition with Features Inspired by Visual Cortex. In: CVPR.Google Scholar

Simard, P. Y., Steinkraus, D., and Platt, J. C. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In: ICDAR.Google Scholar

Vaillant, R., Monrocq, C., and LeCun, Y. 1994. Original Approach for the Localisation of Objects in Images. IEEE Proceedings on Vision, Image, and Signal Processing, 141(4), 245–250.CrossRef Google Scholar

Weston, J., Rattle, F., and Collobert, R. 2008. Deep Learning via Semi-Supervised Embedding. In: ICML.CrossRef Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.