Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-jr42d Total loading time: 0 Render date: 2024-04-18T18:23:38.824Z Has data issue: false hasContentIssue false

18 - Large-Scale Learning for Vision with GPUs

from Part Four - Applications

Published online by Cambridge University Press:  05 February 2012

Adam Coates
Affiliation:
Stanford University
Rajat Raina
Affiliation:
Facebook Inc., Palo Alto, CA, USA
Andrew Y. Ng
Affiliation:
Stanford University
Ron Bekkerman
Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko
Affiliation:
Microsoft Research, Redmond, Washington
John Langford
Affiliation:
Yahoo! Research, New York
Get access

Summary

Computer vision is a challenging application area for learning algorithms. For instance, the task of object detection is a critical problem for many systems, like mobile robots, that remains largely unsolved. In order to interact with the world, robots must be able to locate and recognize large numbers of objects accurately and at reasonable speeds. Unfortunately, off-the-shelf computer vision algorithms do not yet achieve sufficiently high detection performance for these applications. A key difficulty with many existing algorithms is that they are unable to take advantage of large numbers of examples. As a result, they must rely heavily on prior knowledge and hand-engineered features that account for the many kinds of errors that can occur. In this chapter, we present two methods for improving performance by scaling up learning algorithms to large datasets: (1) using graphics processing units (GPUs) and distributed systems to scale up the standard components of computer vision algorithms and (2) using GPUs to automatically learn high-quality feature representations using deep belief networks (DBNs). These methods are capable of not only achieving high performance but also removing much of the need for hand-engineering common in computer vision algorithms.

The fragility of many vision algorithms comes from their lack of knowledge about the multitude of visual phenomena that occur in the real world. Whereas humans can intuit information about depth, occlusion, lighting, and even motion from still images, computer vision algorithms generally lack the ability to deal with these phenomena without being engineered to account for them in advance.

Type
Chapter
Information
Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 373 - 398
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsabti, K., Ranka, S., and Singh, V. 1998. CLOUDS: A decision tree classifier for large datasets. In: 4th International Conference on Knowledge Discovery and Data Mining.Google Scholar
Banko, M., and Brill, E. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In: 39th Annual Meeting on Association for Computational Linguistics.Google Scholar
Bengio, Y. 2007. Speeding Up Stochastic Gradient Descent. In: Advances in Neural Information Processing Systems: Workshop on Efficient Machine Learning.Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. 2006. Greedy Layer-wise Training of Deep Networks. Pages 153–160 of: Advances in Neural Information Processing Systems.Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks.Google Scholar
Catanzaro, B., Sundaram, N., and Keutzer, K. 2008. Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th International Conference on Machine Learning.Google Scholar
Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G. R., Ng, A. Y., and Olukotun, K. 2006. Map-Reduce for Machine Learning on Multicore. Pages 281–288 of: Neural Information Processing Systems.Google Scholar
Coates, A., and Ng, A. Y. 2010. Multi-camera Objection Detection for Robotics. In: IEEE International Conference on Robotics and Automation.Google Scholar
Coates, A., Baumstarck, P., Le, Q., and Ng, A. Y. 2009. Scalable Learning for Object Detection with GPU Hardware. In: IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar
Dalal, N., and Triggs, B. 2005. Histograms of Oriented Gradients for Human Detection. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Dean, J., and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. In: Sixth Symposium on Operating System Design and Implementation.Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2009. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascalnetwork.org/challenges/VOC/voc2009/workshop/index.html.
Felzenszwalb, P., Mcallester, D., and Ramanan, D. 2008. A Discriminatively Trained, Multiscale, Deformable Part Model. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
Ferrari, V., Fevrier, L., Jurie, F., and Schmid, C. 2008. Groups of Adjacent Contour Segments for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.CrossRefGoogle ScholarPubMed
Friedman, J., Hastie, T., and Tibshirani, R. 1998. Additive Logistic Regression: A Statistical View of Boosting. Technical Report, Department of Statistics, Stanford University.Google Scholar
Goto, K., and Van De Geijn, R. 2008. High-performance Implementation of the Level-3 BLAS. ACM Transactions on Mathematical Software, 35(1), 1–14.CrossRefGoogle Scholar
Grauman, K., and Darrell, T. 2005. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: Tenth IEEE International Conference on Computer Vision.Google Scholar
Griffin, G., Holub, A., and Perona, P. 2007. Caltech-256 Object Category Dataset. Technical Report, California Institute of Technology.Google Scholar
Heymann, S., Miller, K., Smolic, A., Frhlich, B., and Wiegand, T. 2007. SIFT implementation and optimization for general-purpose GPU. In: 15th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision.Google Scholar
Hinton, G. E. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771–1800.CrossRefGoogle ScholarPubMed
Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.CrossRefGoogle ScholarPubMed
Hinton, G. E., Osindero, S., and Teh, Y.-W. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554.CrossRefGoogle ScholarPubMed
LeCun, Y., Huang, F. J., and Bottou, L. 2004. Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Lee, H., Chaitanya, E., and Ng, A. Y. 2007. Sparse deep belief net model for visual area V2. Pages 873–880 of: Advances in Neural Information Processing Systems.Google Scholar
Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. 2009. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In: Proceedings of the 26th International Conference on Machine Learning.Google Scholar
Lowe, D. G. 1999. Object Recognition from Local Scale-invariant Features. Pages 1150–1157 of: Seventh IEEE International Conference on Computer Vision, vol. 2.CrossRefGoogle Scholar
Nister, D., and Stewenius, H. 2006. Scalable Recognition with a Vocabulary Tree. Pages 2161–2168 of: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Vidia, . 2009. nVidia CUDA Programming Guide. NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA.Google Scholar
Opelt, A., Pinz, A., and Zisserman, A. 2006. Incremental Learning of Object Detectors Using a Visual Shape Alphabet. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Osuna, E., Freund, R., and Girosi, F. 1997. Training Support Vector Machines: An Application to Face Detection. In: Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.CrossRefGoogle Scholar
Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q. V., Wellman, A., and Ng, A. Y. 2009. Highaccuracy 3D Sensing for Mobile Manipulation: Improving Object Detection and Door Opening. In: IEEE International Conference on Robotics and Automation.Google Scholar
Raina, R., Madhavan, A., and Ng, A. 2009. Large-Scale Deep Unsupervised Learning Using Graphics Processors. Pages 873–880 of: Bottou, L., and Littman, M. (eds), Proceedings of the 26th International Conference on Machine Learning. Montreal: Omnipress.Google Scholar
Ranzato, M. A., and Szummer, M. 2008. Semi-supervised Learning of Compact Document Representations with Deep Networks. Pages 792–799 of: Proceedings of the 25th International Conference on Machine Learning.Google Scholar
Rowley, H.A., Baluja, S., and Kanade, T. 1995. Human Face Detection inVisual Scenes. In: Advances in Neural Information Processing Systems.Google Scholar
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2005. Labelme: A Database andWebbased Tool for Image Annotation. Technical Report MIT-CSAIL-TR-2005-056, Massachusetts Institute of Technology.Google Scholar
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2008. LabelMe: A Database and Web-based Tool for Image Annotation. International Journal of Computer Vision, 77(May), 157–173.CrossRefGoogle Scholar
Salakhutdinov, R., and Hinton, G. 2007. Semantic Hashing. In: SIGIR Workshop on Information Retrieval and Applications of Graphical Models.Google Scholar
Sapp, B., Saxena, A., and Ng, A. Y. 2008. A Fast Data Collection and Augmentation Procedure for Object Recognition. In: AAAI'08: Proceedings of the 23rd National Conference on Artificial Intelligence.Google Scholar
Schneiderman, H., and Kanade, T. 2000. A Statistical Method for 3D Object Detection Applied to Faces and Cars. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Schneiderman, H., and Kanade, T. 2004. Object Detection Using the Statistics of Parts. International Journal of Computer Vision.CrossRefGoogle Scholar
Torralba, A., Fergus, R., and Freeman, W. T. 2007a. 80 Million Tiny Images: A Large Dataset for Non-parametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar
Torralba, A., Murphy, K. P., and Freeman, W. T. 2007b. Sharing Visual Features for Multiclass and Multiview Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.CrossRefGoogle ScholarPubMed
Torralba, A., Fergus, R., and Weiss, Y. 2008. SmallCodes and Large Image Databases for Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarPubMed
van Hateren, J. H., and van der Schaaf, A. 1998. Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex. Proceedings of the Royal Society of London B, 265, 359–366.CrossRefGoogle ScholarPubMed
Viola, P., and Jones, M. J. 2001. Robust Real-time Object Detection. International Journal of Computer Vision.Google Scholar
Viola, P., and Jones, M. J. 2004. Robust Real-Time Face Detection. International Journal of Computer Vision.CrossRefGoogle Scholar
Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing, 27(1–2), 3–35.CrossRefGoogle Scholar
Winn, J., Criminisi, A., and Minka, T. 2005. Object Categorization by Learned Universal Visual Dictionary. In: Tenth IEEE International Conference on Computer Vision.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×